Mosaic: Processing a Trillion-Edge Graph on a Single Machine Steffen Maass , Changwoo Min, Sanidhya Kashyap, Woonhak Kang, Mohan Kumar, Taesoo Kim Georgia Institute of Technology Best Student Paper April 26, 2017 Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 1 / 21
Large-scale graph processing is ubiquitous Social networks Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 2 / 21
Large-scale graph processing is ubiquitous Social networks Genome analysis Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 2 / 21
Large-scale graph processing is ubiquitous Social networks Genome analysis Graphs enable Machine Learning Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 2 / 21
Powerful, heterogeneous machines Terabytes of RAM on multiple sockets Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 3 / 21
Powerful, heterogeneous machines Terabytes of RAM on multiple sockets Powerful many-core coprocessors Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 3 / 21
Powerful, heterogeneous machines Terabytes of RAM on multiple sockets Powerful many-core coprocessors Fast, large-capacity Non-volatile Memory Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 3 / 21
Powerful, heterogeneous machines Take advantage of heterogeneous machine to process tera-scale graphs Terabytes of RAM on multiple sockets Powerful many-core coprocessors Fast, large-capacity Non-volatile Memory Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 3 / 21
Table of contents Graph Processing: Sample Application 1 Design 2 Mosaic Architecture Graph Encoding API Evaluation 3 Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 4 / 21
Graph Processing: Applications Community Detection Find Common Friends Find Shortest Paths Estimate Impact of Vertices (webpages, users, . . . ) . . . Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 5 / 21
Mosaic: Design space Graph Processing has many faces: Single Machine Out-of-core In memory Cluster Out-of-core In memory Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 6 / 21
Mosaic: Design space Graph Processing has many faces: Single Machine Out-of-core ⇒ Cheap, but potentially slow In memory ⇒ Fast, but limited graph size Cluster Out-of-core ⇒ Large graphs, but expensive & slow In memory ⇒ Large graphs & fast, but very expensive Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 6 / 21
Mosaic: Design space Graph Processing has many faces: Single Machine Out-of-core ⇒ Cheap, but potentially slow In memory ⇒ Fast, but limited graph size Cluster Out-of-core ⇒ Large graphs, but expensive & slow In memory ⇒ Large graphs & fast, but very expensive ⇒ Single machine, out-of-core is most cost-effective ⇒ Goal: Good performance and large graphs! Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 6 / 21
Mosaic: Design goals Goal Run algorithms on very large graphs on a single machine using coprocessors Enabled by: Common, familiar API (vertex/edge-centric) Encoding: Lossless compression Cache locality Processing on isolated subgraphs Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 7 / 21
Architecture of Mosaic Usage of Xeon Phi & NVMe Involvement of Host <current state> <next state> Global ... ... vertex state stripped ... Host Processors fetch receive (Xeon) per Xeon Phi (×4) . . . Meta (×6) transfer PCIe ... Tile I 1 I 2 transfer ... edge T 1 T 2 ... processing (×61 cores) NVMe Xeon Phi Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 8 / 21
Graph encoding: Idea Compression Split graph into subgraphs, use local (short) identifiers Cache locality Inside subgraphs: Sort by access order Between subgraphs: Overlap vertex sets Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 9 / 21
Background: Column first Locality for write Multiple sequential reads Target vertex 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 P 11 P 12 P 13 P 14 4 5 6 P 21 P 22 P 23 P 24 7 8 P 34 9 P 31 P 32 P 33 10 11 12 P 41 P 42 P 43 P 44 Source Global adjacency vertex matrix Partition (S = 3) Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 10 / 21 ⇒ Problem: No locality when switching column
Background: Row first Locality for read Multiple sequential writes Target vertex 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 P 11 P 12 P 13 P 14 4 5 6 P 21 P 22 P 23 P 24 7 8 P 34 9 P 31 P 32 P 33 10 11 12 P 41 P 42 P 43 P 44 Source Global adjacency vertex matrix Partition (S = 3) Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 11 / 21 ⇒ Problem: No locality when switching row
Background: Hilbert order Space-filling curve Provides locality between adjacent data points Target vertex 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 P 11 P 12 P 13 P 14 4 5 6 P 21 P 22 P 23 P 24 7 8 P 34 9 P 31 P 32 P 33 10 11 12 P 41 P 42 P 43 P 44 Source Global adjacency vertex matrix Partition (S = 3) Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 12 / 21
② ④ ⑥ ⑤ ① ③ From global to local: Tiles Convert graph to set of tiles 1) Start with adjacency Matrix: Target vertex 1 2 3 4 5 6 7 8 9 10 11 12 1 ➊ ➋ ➊ 2 ➍ ➋ 3 P 11 P 12 P 13 P 14 ➑ 4 ➐ ➌ ➎ 5 ➒ ➏ ➍ 6 ➑ P 21 P 22 P 23 P 24 7 ➏ 8 ➒ P 34 9 P 31 P 32 P 33 10 ➎ 11 ➌ ➐ 12 P 41 P 42 P 43 P 44 Source Global adjacency vertex Partition matrix (S = 3) Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 13 / 21
① ① ② ③ ④ ① ④ ③ ② ① ⑤ ⑥ ② ① From global to local: Tiles Convert graph to set of tiles 2) Use first edge in tile T 1 : Target vertex (global) 1 2 3 4 5 6 7 8 9 10 11 12 Tile-1 (T 1 ) 1 ➊ ➋ ➊ 2 ➊ ➍ ➋ 3 P 11 P 12 P 13 P 14 meta (I 1 ) ➑ 4 ➐ ➌ ➎ ( ,1) ➍ ( ,2) 5 ➒ ➏ (local) ➑ 6 P 21 P 22 P 23 P 24 7 ➏ ➒ 8 ➎ P 34 9 P 31 P 32 P 33 ➌ 10 ➐ 11 12 P 41 P 42 P 43 P 44 Source : local vertex id Global adjacency vertex Partition ( ,1) : local → global id matrix (global) : local edge store order (S = 3) ➊ Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 13 / 21
③ ⑥ ② ① ① ② ③ ④ ④ ① ① ⑤ ③ ④ ② ① From global to local: Tiles Convert graph to set of tiles 3) Consume as many edges as possible: Target vertex (global) 1 2 3 4 5 6 7 8 9 10 11 12 Tile-1 (T 1 ) 1 ➊ ➋ ➊ 2 ➊ ➍ ➋ ➋ ➍ 3 P 11 P 12 P 13 P 14 meta (I 1 ) ➌ ➑ 4 ➐ ➌ ➎ ( ,1) ( ,5) ➍ ( ,2) ( ,4) 5 ➒ ➏ (local) ➑ 6 P 21 P 22 P 23 P 24 7 ➏ ➒ 8 ➎ P 34 9 P 31 P 32 P 33 ➌ 10 ➐ 11 12 P 41 P 42 P 43 P 44 Source : local vertex id Global adjacency vertex Partition ( ,1) : local → global id matrix (global) : local edge store order (S = 3) ➊ Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 13 / 21
③ ① ③ ④ ① ② ④ ① ② ③ ④ ② ① ③ ④ ① ④ ③ ① ② ① ⑤ ⑥ ② From global to local: Tiles Convert graph to set of tiles 4) Next edges do not fit in T 1 , construct T 2 : Target vertex (global) 1 2 3 4 5 6 7 8 9 10 11 12 Tile-1 (T 1 ) 1 ➊ ➋ ➊ 2 ➊ ➍ ➋ ➋ ➍ 3 P 11 P 12 P 13 P 14 meta (I 1 ) ➌ ➑ 4 ➐ ➌ ➎ ( ,1) ( ,5) ➍ ( ,2) ( ,4) 5 ➒ ➏ (local) ➑ 6 Tile-2 (T 2 ) P 21 P 22 P 23 P 24 7 ➏ ➒ ➎ 8 ➐ ➎ P 34 9 P 31 P 32 P 33 ➑ ➏ ➌ 10 ➒ ➐ meta (I 2 ) ( ,4) ( ,5) 11 (local) ( ,6) ( ,3) 12 P 41 P 42 P 43 P 44 Source : local vertex id Global adjacency vertex Partition ( ,1) : local → global id matrix (global) : local edge store order (S = 3) ➊ Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 13 / 21
Locality with Hilbert-ordered tiles Overlapping sets of sources and targets Target vertex 1 2 3 4 5 6 7 8 9 10 11 12 1 ➊ ➋ 2 ➍ 3 P 11 P 12 P 13 P 14 4 ➐ ➌ ➎ 5 ➒ ➏ 6 ➑ P 21 P 22 P 23 P 24 7 8 P 34 9 P 31 P 32 P 33 10 11 12 P 41 P 42 P 43 P 44 Source Global adjacency vertex Partition matrix (S = 3) ⇒ Better locality than row-first or column-first Steffen Maass Mosaic: Trillion Edges on a Single Machine April 26, 2017 14 / 21
Recommend
More recommend