Evaluation of Parallel Graph Loading Techniques Manuel Then, Moritz Kaufmann, Alfons Kemper, Thomas Neumann Technical University of Munich Chair of Database Systems
Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques 3
Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques 4
General Graph Loading Pipeline Goal : Efficiently load a given graph dataset for explorative analytics • Parse edges and create relabeling • Write edges to worker-local buffer Read • Find unique vertices • Count neighbors Sync • Create final graph data structure • Apply final relabeling Write Analytics • The actual analytics work Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques 5
Scenario-specific Graph Loading Problem : The optimal way of loading the graph depends on various factors: • Format of the graph data • Source of the data • Properties of the input data • Target graph data structure • Execution machine Graph loading pipeline must be adapted to the scenario at hand Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques 6
General Graph Loading Pipeline Goal : Efficiently load a given graph dataset for explorative analytics • Parse edges and create relabeling • Write edges to worker-local buffer Read • Find unique vertices • Count neighbors Sync • Create final graph data structure • Apply final relabeling Write Analytics • The actual analytics work Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques 7
General Graph Loading Pipeline Identifier data type? binary, Goal : Efficiently load a given graph dataset for explorative analytics decimal, string? • Parse edges and create relabeling • Write edges to worker-local buffer Read • Find unique vertices • Count neighbors Sync • Create final graph data structure • Apply final relabeling Write Analytics • The actual analytics work Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques 8
General Graph Loading Pipeline Identifier data Can input data type? binary, Goal : Efficiently load a given graph dataset for explorative analytics be read multiple decimal, string? times? • Parse edges and create relabeling • Write edges to worker-local buffer Read • Find unique vertices • Count neighbors Sync • Create final graph data structure • Apply final relabeling Write Analytics • The actual analytics work Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques 9
General Graph Loading Pipeline Identifier data Random Can input data type? binary, Goal : Efficiently load a given graph dataset for explorative analytics access be read multiple decimal, string? possible? times? • Parse edges and create relabeling • Write edges to worker-local buffer Read • Find unique vertices • Count neighbors Sync • Create final graph data structure • Apply final relabeling Write Analytics • The actual analytics work Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques 10
General Graph Loading Pipeline Identifier data Random Can input data type? binary, Goal : Efficiently load a given graph dataset for explorative analytics access be read multiple decimal, string? possible? times? • Parse edges and create relabeling • Write edges to worker-local buffer Read Explicit vertex list available? • Find unique vertices • Count neighbors Sync • Create final graph data structure • Apply final relabeling Write Analytics • The actual analytics work Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques 11
General Graph Loading Pipeline Identifier data Random Can input data type? binary, Goal : Efficiently load a given graph dataset for explorative analytics access be read multiple decimal, string? possible? times? • Parse edges and create relabeling • Write edges to worker-local buffer Read Explicit vertex list available? • Find unique vertices • Count neighbors Sync • Create final graph data structure • Apply final relabeling Write Analytics • The actual analytics work Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques 12
General Graph Loading Pipeline Identifier data Random Can input data type? binary, Goal : Efficiently load a given graph dataset for explorative analytics access be read multiple decimal, string? possible? times? • Parse edges and create relabeling • Write edges to worker-local buffer Read Explicit vertex list available? • Find unique vertices Which data • Count neighbors structure to Sync generate? • Create final graph data structure • Apply final relabeling Write Analytics • The actual analytics work Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques 13
General Graph Loading Pipeline Identifier data Random Can input data type? binary, Goal : Efficiently load a given graph dataset for explorative analytics access be read multiple decimal, string? possible? times? • Parse edges and create relabeling • Write edges to worker-local buffer Read Explicit vertex list available? • Find unique vertices Which data • Count neighbors structure to Sync generate? • Create final graph data structure • Apply final relabeling Write Analytics • The actual analytics work Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques 14
General Graph Loading Pipeline Goal : Efficiently load a given graph dataset for explorative analytics • Parse edges and create relabeling • Write edges to worker-local buffer Read • Find unique vertices • Count neighbors Sync • Create final graph data structure • Apply final relabeling Write Analytics • The actual analytics work Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques 15
Parsers Binary reader • No parsing necessary => directly copy vertex identifiers • Every edge same size => work splitting trivial Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques 16
Parsers Binary reader • No parsing necessary => directly copy vertex identifiers • Every edge same size => work splitting trivial Library-provided decimal parsing • Readily-available for many languages • We evaluated C++’s stream operator and strtol • Varying edge length => work splitting more complex Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques 17
Parsers 2x 20x 200x Binary reader • No parsing necessary => directly copy vertex identifiers • Every edge same size => work splitting trivial Library-provided decimal parsing • Readily-available for many languages • We evaluated C++’s stream operator and strtol • Varying edge length => work splitting more complex Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques 18
Parsers 2x 20x 200x Binary reader • No parsing necessary => directly copy vertex identifiers • Every edge same size => work splitting trivial Library-provided decimal parsing • Readily-available for many languages • We evaluated C++’s stream operator and strtol • Varying edge length => work splitting more complex Iterative decimal parsing • Multiply by ten and add character’s respective digit Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques 19
Parsers 2x 20x 200x Binary reader • No parsing necessary => directly copy vertex identifiers • Every edge same size => work splitting trivial Library-provided decimal parsing • Readily-available for many languages • We evaluated C++’s stream operator and strtol • Varying edge length => work splitting more complex Iterative decimal parsing • Multiply by ten and add character’s respective digit Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques 20
Parsers 2x 20x 200x Binary reader • No parsing necessary => directly copy vertex identifiers • Every edge same size => work splitting trivial Library-provided decimal parsing • Readily-available for many languages • We evaluated C++’s stream operator and strtol • Varying edge length => work splitting more complex Iterative decimal parsing • Multiply by ten and add character’s respective digit Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques 21
Parsers 2x 20x 200x Binary reader • No parsing necessary => directly copy vertex identifiers • Every edge same size => work splitting trivial Library-provided decimal parsing • Readily-available for many languages • We evaluated C++’s stream operator and strtol • Varying edge length => work splitting more complex Iterative decimal parsing • Multiply by ten and add character’s respective digit Vectorized decimal parsing • Leverage wide vector units for identifier parsing Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques 22
Recommend
More recommend