parallel dataflow graph coloring
play

Parallel Dataflow Graph Coloring uce 1 , Erik Saule 2 , and urek 1 , - PowerPoint PPT Presentation

Parallel Dataflow Graph Coloring uce 1 , Erik Saule 2 , and urek 1 , 3 Ahmet Erdem Sary Umit V. C ataly esaule@uncc.edu, { aerdem,umit } @bmi.osu.edu 1 Department of Biomedical Informatics, The Ohio State University 2 Department of


  1. Parallel Dataflow Graph Coloring uce 1 , Erik Saule 2 , and ¨ urek 1 , 3 Ahmet Erdem Sarıy¨ Umit V. C ¸ataly¨ esaule@uncc.edu, { aerdem,umit } @bmi.osu.edu 1 Department of Biomedical Informatics, The Ohio State University 2 Department of Computer Science, University of North Carolina at Charlotte 3 Department of Electrical and Computer Engineering, The Ohio State University Scheduling in AussoisˆW DagstuhlˆWˆWˆW Algorithms and Scheduling Techniques for Exascale Systems Erik Saule (UNCC) Parallel Dataflow Coloring Dagstuhl 2013 1 / 21

  2. Outline Parallel Graph Coloring 1 Dataflow Graph Coloring 2 What’s the link with scheduling? 3 Conclusion 4 Erik Saule (UNCC) Parallel Dataflow Coloring Dagstuhl 2013 2 / 21

  3. The Graph Coloring Problem 0 Definition 5 3 Coloring a graph consists in 1 assigning a color (an integer) to each vertex so that no two adjacent 2 4 vertices have the same color. Complexity 0 The problem of finding the coloring 5 with minimum number of colors is 3 1 NP-Hard. No approximation within | V | 1 − ǫ . Greedy algorithm returns a solution 2 4 with less than 1 + ∆ colors. Erik Saule (UNCC) Parallel Dataflow Coloring Dagstuhl 2013 3 / 21

  4. Graph Coloring Algorithm First Fit algorithm Pick a vertex and assign it the first Algorithm 1: Sequential greedy coloring. available color. Then pick another Data : G = ( V , E ) for each v ∈ V do one. for each w ∈ adj ( v ) do forbiddenColors[color[ w ]] ← v There exists a vertex ordering which color[ v ] ← min { i > 0 : forbiddenColors[ i ] � = v } leads to an optimal coloring. Many derivative algorithms: With Largest First With Smallest Last Dynamic orderings Least Used instead of First Fit. Iterated algorithm to do local descent. Today, let’s talk about the natural one. Erik Saule (UNCC) Parallel Dataflow Coloring Dagstuhl 2013 4 / 21

  5. Parallel Speculative Graph Coloring (Shared Memory) 0 0 0 5 5 5 3 3 3 1 1 1 2 2 2 4 4 4 Algorithm 2: TentativeColoring Algorithm 3: DetectConflict Data : G = ( V , E ), Visit ⊂ V , color[1 : | V | ] maxcolor ← 1 Data : G = ( V , E ), Visit ⊂ V , color[1 : | V | ] localMC ← 1 Conflict ← ∅ for each v ∈ Visit in parallel do for each v ∈ Visit in parallel do for each w ∈ adj ( v ) do for each w ∈ adj ( v ) do localFC[color[ w ]] ← v if color [ v ] = color [ w ] then color[ v ] ← min { i > 0 : localFC[ i ] � = v } if v < w then if color [ v ] > localMC then atomic Conflict ← Conflict ∪{ v } localMC ← color[ v ] return Conflict maxcolor ← Reduce(max) localMC return maxcolor At least two passes. More if unlucky (in practice 2 + ǫ ) Erik Saule (UNCC) Parallel Dataflow Coloring Dagstuhl 2013 5 / 21

  6. Outline Parallel Graph Coloring 1 Dataflow Graph Coloring 2 What’s the link with scheduling? 3 Conclusion 4 Erik Saule (UNCC) Parallel Dataflow Coloring Dagstuhl 2013 6 / 21

  7. Parallel Dataflow Algorithm Principle The principle of Dataflow algorithm is that the generation of a result triggers the computation of the next tasks. Dataflow coloring The idea is to pick an absolute order of the vertices and each vertex only consider the color of the vertices with ID lesser than theirs. 0 5 0 and 1 can be executed concurrently 3 1 2 and 3 can be executed concurrently 2 4 and 5 can be executed concurrently 4 Not speculative, so only one pass. Erik Saule (UNCC) Parallel Dataflow Coloring Dagstuhl 2013 7 / 21

  8. Two approaches Pick the vertices in some order. What happens when you pick a vertex with neighboors with high priority which haven’t been allocated a color. Recursive Dataflow (Direct) Dataflow You recursively process the neighbor. You wait. No waiting time Some form of “workstealing” No redudant work algorithm Simpler worksharing constraint. Complex synchronisation But maybe you waste time Higher memory allocation (or waiting. potentially redundant work) Erik Saule (UNCC) Parallel Dataflow Coloring Dagstuhl 2013 8 / 21

  9. Which is best? 5.5 Dataflow Dataflow-recursive 5.0 4.5 4.0 3.5 speedup 3.0 2.5 2.0 1.5 1.0 0.5 0.0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 number of threads with lots of (yet) unexplained optimizations. Erik Saule (UNCC) Parallel Dataflow Coloring Dagstuhl 2013 9 / 21

  10. Outline Parallel Graph Coloring 1 Dataflow Graph Coloring 2 What’s the link with scheduling? 3 Conclusion 4 Erik Saule (UNCC) Parallel Dataflow Coloring Dagstuhl 2013 10 / 21

  11. In practice, parallel speedup is 1 6 3 9 1 4 7 The graph is executed one vertex after another. So there are actually 2 8 10 5 dependencies. Graham List Scheduling When scheduling a dag, a greedy algorithm gets: C max ≤ W p + (1 − 1 p ) CP Erik Saule (UNCC) Parallel Dataflow Coloring Dagstuhl 2013 11 / 21

  12. In practice, parallel speedup is 1 6 3 9 1 4 7 The graph is executed one vertex after another. So there are actually 2 8 10 5 dependencies. Graham List Scheduling 6 When scheduling a dag, a greedy 3 9 1 algorithm gets: C max ≤ W p + (1 − 1 4 7 p ) CP 2 8 10 5 Erik Saule (UNCC) Parallel Dataflow Coloring Dagstuhl 2013 11 / 21

  13. It gets worse... Static Scheduling 6 3 9 1 If you use a static OpenMP schedule, you add de facto 4 7 dependencies in your graph. And the 2 8 10 critical path increases significantly. 5 That’s easy! Let’s use dynamic instead! Erik Saule (UNCC) Parallel Dataflow Coloring Dagstuhl 2013 12 / 21

  14. It gets worse... Static Scheduling 6 3 9 1 If you use a static OpenMP schedule, you add de facto 4 7 dependencies in your graph. And the 2 8 10 critical path increases significantly. 5 That’s easy! Let’s use dynamic instead! Erik Saule (UNCC) Parallel Dataflow Coloring Dagstuhl 2013 12 / 21

  15. Is Dynamic Better? Dynamic Scheduling Even if you use a dynamic OpenMP schedule, similar effect still happen. 6 3 9 1 With two threads 4 and 5 need to 4 7 be executed before 6 can start. So 1, 2 and 3 are implicit predecessor of 2 8 10 5 6. Because that is what the scheduler will do. An easy solution Compute a level by level order. Well... That requires a graph traversal. The whole point was to traverse the graph only once. Erik Saule (UNCC) Parallel Dataflow Coloring Dagstuhl 2013 13 / 21

  16. Is Dynamic Better? Dynamic Scheduling Even if you use a dynamic OpenMP schedule, similar effect still happen. 6 3 9 1 With two threads 4 and 5 need to 4 7 be executed before 6 can start. So 1, 2 and 3 are implicit predecessor of 2 8 10 5 6. Because that is what the scheduler will do. An easy solution Compute a level by level order. Well... That requires a graph traversal. The whole point was to traverse the graph only once. Erik Saule (UNCC) Parallel Dataflow Coloring Dagstuhl 2013 13 / 21

  17. It gets EVEN worse Nobody should use “dynamic,1” 6 Since vertices are grouped together, 3 9 by openmp’s granularity, you have 1 implicit edges between the vertices 4 7 of each group. 2 8 10 There is an implicit edge between 1 5 and 2, and between 5 and 6. In this type of kernel, you should at least use groups of 32 vertices. Erik Saule (UNCC) Parallel Dataflow Coloring Dagstuhl 2013 14 / 21

  18. It gets EVEN worse Nobody should use “dynamic,1” 6 Since vertices are grouped together, 3 9 by openmp’s granularity, you have 1 implicit edges between the vertices 4 7 of each group. 2 8 10 There is an implicit edge between 1 5 and 2, and between 5 and 6. In this type of kernel, you should at least use groups of 32 vertices. Erik Saule (UNCC) Parallel Dataflow Coloring Dagstuhl 2013 14 / 21

  19. First Results 3.0 8 Real Dynamic Expected Dynamic Real Static Real Dynamic Expected Dynamic Real Static 7 Expected Static Expected Static 2.5 6 2.0 5 speedup speedup 1.5 4 3 1.0 2 0.5 1 0.0 0 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 number of threads number of threads (a) auto (b) ldoor Ouch! Erik Saule (UNCC) Parallel Dataflow Coloring Dagstuhl 2013 15 / 21

  20. Reordering of vertex IDs Just a chain At best 1 2 3 4 5 6 7 8 2 4 1 3 6 5 7 8 The critical path can be quite long Need to traverse the graphs... in a natural ordering. Erik Saule (UNCC) Parallel Dataflow Coloring Dagstuhl 2013 16 / 21

  21. Reordering of vertex IDs Just a chain At best 1 2 3 4 5 6 7 8 2 4 1 3 6 5 7 8 The critical path can be quite long Need to traverse the graphs... in a natural ordering. At random Not best but probably not the worst 1 3 5 you can get. 7 2 4 6 For cache purposes, you need to keep some locality, so you shuffle 8 blocks of vertices. Any guarantee on that? Erik Saule (UNCC) Parallel Dataflow Coloring Dagstuhl 2013 16 / 21

  22. Results Erik Saule (UNCC) Parallel Dataflow Coloring Dagstuhl 2013 17 / 21

  23. Outline Parallel Graph Coloring 1 Dataflow Graph Coloring 2 What’s the link with scheduling? 3 Conclusion 4 Erik Saule (UNCC) Parallel Dataflow Coloring Dagstuhl 2013 18 / 21

Recommend


More recommend