evaluation of parallel graph loading techniques
play

Evaluation of Parallel Graph Loading Techniques Manuel Then, Moritz - PowerPoint PPT Presentation

Evaluation of Parallel Graph Loading Techniques Manuel Then, Moritz Kaufmann, Alfons Kemper, Thomas Neumann Technical University of Munich Chair of Database Systems Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques 3


  1. Evaluation of Parallel Graph Loading Techniques Manuel Then, Moritz Kaufmann, Alfons Kemper, Thomas Neumann Technical University of Munich Chair of Database Systems

  2. Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques 3

  3. Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques 4

  4. General Graph Loading Pipeline Goal : Efficiently load a given graph dataset for explorative analytics • Parse edges and create relabeling • Write edges to worker-local buffer Read • Find unique vertices • Count neighbors Sync • Create final graph data structure • Apply final relabeling Write Analytics • The actual analytics work Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques 5

  5. Scenario-specific Graph Loading Problem : The optimal way of loading the graph depends on various factors: • Format of the graph data • Source of the data • Properties of the input data • Target graph data structure • Execution machine Graph loading pipeline must be adapted to the scenario at hand Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques 6

  6. General Graph Loading Pipeline Goal : Efficiently load a given graph dataset for explorative analytics • Parse edges and create relabeling • Write edges to worker-local buffer Read • Find unique vertices • Count neighbors Sync • Create final graph data structure • Apply final relabeling Write Analytics • The actual analytics work Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques 7

  7. General Graph Loading Pipeline Identifier data type? binary, Goal : Efficiently load a given graph dataset for explorative analytics decimal, string? • Parse edges and create relabeling • Write edges to worker-local buffer Read • Find unique vertices • Count neighbors Sync • Create final graph data structure • Apply final relabeling Write Analytics • The actual analytics work Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques 8

  8. General Graph Loading Pipeline Identifier data Can input data type? binary, Goal : Efficiently load a given graph dataset for explorative analytics be read multiple decimal, string? times? • Parse edges and create relabeling • Write edges to worker-local buffer Read • Find unique vertices • Count neighbors Sync • Create final graph data structure • Apply final relabeling Write Analytics • The actual analytics work Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques 9

  9. General Graph Loading Pipeline Identifier data Random Can input data type? binary, Goal : Efficiently load a given graph dataset for explorative analytics access be read multiple decimal, string? possible? times? • Parse edges and create relabeling • Write edges to worker-local buffer Read • Find unique vertices • Count neighbors Sync • Create final graph data structure • Apply final relabeling Write Analytics • The actual analytics work Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques 10

  10. General Graph Loading Pipeline Identifier data Random Can input data type? binary, Goal : Efficiently load a given graph dataset for explorative analytics access be read multiple decimal, string? possible? times? • Parse edges and create relabeling • Write edges to worker-local buffer Read Explicit vertex list available? • Find unique vertices • Count neighbors Sync • Create final graph data structure • Apply final relabeling Write Analytics • The actual analytics work Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques 11

  11. General Graph Loading Pipeline Identifier data Random Can input data type? binary, Goal : Efficiently load a given graph dataset for explorative analytics access be read multiple decimal, string? possible? times? • Parse edges and create relabeling • Write edges to worker-local buffer Read Explicit vertex list available? • Find unique vertices • Count neighbors Sync • Create final graph data structure • Apply final relabeling Write Analytics • The actual analytics work Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques 12

  12. General Graph Loading Pipeline Identifier data Random Can input data type? binary, Goal : Efficiently load a given graph dataset for explorative analytics access be read multiple decimal, string? possible? times? • Parse edges and create relabeling • Write edges to worker-local buffer Read Explicit vertex list available? • Find unique vertices Which data • Count neighbors structure to Sync generate? • Create final graph data structure • Apply final relabeling Write Analytics • The actual analytics work Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques 13

  13. General Graph Loading Pipeline Identifier data Random Can input data type? binary, Goal : Efficiently load a given graph dataset for explorative analytics access be read multiple decimal, string? possible? times? • Parse edges and create relabeling • Write edges to worker-local buffer Read Explicit vertex list available? • Find unique vertices Which data • Count neighbors structure to Sync generate? • Create final graph data structure • Apply final relabeling Write Analytics • The actual analytics work Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques 14

  14. General Graph Loading Pipeline Goal : Efficiently load a given graph dataset for explorative analytics • Parse edges and create relabeling • Write edges to worker-local buffer Read • Find unique vertices • Count neighbors Sync • Create final graph data structure • Apply final relabeling Write Analytics • The actual analytics work Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques 15

  15. Parsers Binary reader • No parsing necessary => directly copy vertex identifiers • Every edge same size => work splitting trivial Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques 16

  16. Parsers Binary reader • No parsing necessary => directly copy vertex identifiers • Every edge same size => work splitting trivial Library-provided decimal parsing • Readily-available for many languages • We evaluated C++’s stream operator and strtol • Varying edge length => work splitting more complex Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques 17

  17. Parsers 2x 20x 200x Binary reader • No parsing necessary => directly copy vertex identifiers • Every edge same size => work splitting trivial Library-provided decimal parsing • Readily-available for many languages • We evaluated C++’s stream operator and strtol • Varying edge length => work splitting more complex Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques 18

  18. Parsers 2x 20x 200x Binary reader • No parsing necessary => directly copy vertex identifiers • Every edge same size => work splitting trivial Library-provided decimal parsing • Readily-available for many languages • We evaluated C++’s stream operator and strtol • Varying edge length => work splitting more complex Iterative decimal parsing • Multiply by ten and add character’s respective digit Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques 19

  19. Parsers 2x 20x 200x Binary reader • No parsing necessary => directly copy vertex identifiers • Every edge same size => work splitting trivial Library-provided decimal parsing • Readily-available for many languages • We evaluated C++’s stream operator and strtol • Varying edge length => work splitting more complex Iterative decimal parsing • Multiply by ten and add character’s respective digit Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques 20

  20. Parsers 2x 20x 200x Binary reader • No parsing necessary => directly copy vertex identifiers • Every edge same size => work splitting trivial Library-provided decimal parsing • Readily-available for many languages • We evaluated C++’s stream operator and strtol • Varying edge length => work splitting more complex Iterative decimal parsing • Multiply by ten and add character’s respective digit Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques 21

  21. Parsers 2x 20x 200x Binary reader • No parsing necessary => directly copy vertex identifiers • Every edge same size => work splitting trivial Library-provided decimal parsing • Readily-available for many languages • We evaluated C++’s stream operator and strtol • Varying edge length => work splitting more complex Iterative decimal parsing • Multiply by ten and add character’s respective digit Vectorized decimal parsing • Leverage wide vector units for identifier parsing Manuel Then (TUM) | Evaluation of Parallel Graph Loading Techniques 22

Recommend


More recommend