Parallel Depth First on GPU M. Naumov, A. Vrielink and M. Garland, GTC 2017
Introduction Directed Trees Directed Acyclic Graphs (DAGs) AGENDA Path- and SSSP-based variants Optimizations Performance Experiments 2
What is DFS? a Node: a,b,c,d,e,f,g,i,j Parent: d c b Discovery: g e Finish: f i j 3
What is DFS? a Node: a,b,c,d,e,f,g,i,j Parent: /,a d c b Discovery: a,b g e Finish: f i j 4
What is DFS? a Node: a,b,c,d,e,f,g,i,j Parent: /,a, b, d c b Discovery: a,b,e g e Finish: e f i j 5
What is DFS? a Node: a,b,c,d,e,f,g,i,j Parent: /,a, b,b d c b Discovery: a,b,e,f g e Finish: e f i j 6
What is DFS? a Node: a,b,c,d,e,f,g,i,j Parent: /,a, b,b, ,f d c b Discovery: a,b,e,f,i g e Finish: e,i f i j 7
What is DFS? a Node: a,b,c,d,e,f,g,i,j Parent: /,a, b,b, ,f,f d c b Discovery: a,b,e,f,i,j g e Finish: e,i,j f i j 8
What is DFS? a Node: a,b,c,d,e,f,g,i,j Parent: /,a,a,a,b,b,d,f,f d c b Discovery: a,b,e,f,i,j,c,d,g g e Finish: e,i,j,f,b,c,g,d,a f i j 9
Previous Work on DFS Lexicographic DFS Planar Graphs Directed Graphs with Cycles Directed Acyclic Graphs (DAGs) Time O( 𝑜 log 11 n) Time O(log 2 n) Time O(log 2 n) Processors O(n 3 ) Processors O(n ω /log n) Processors O(n) where ω < 2.373 is the matrix multiplication exponent 10
Previous Work on DFS Lexicographic DFS Planar Graphs Directed Graphs with Cycles Directed Acyclic Graphs (DAGs) Time O( 𝑜 log 11 n) Time O(log 2 n) Time O(log 2 n) Processors O(n 3 ) Processors O(n ω /log n) Processors O(n) topological sort, bi-connectivity and planarity testing where ω < 2.373 is the matrix multiplication exponent 11
DIRECTED TREES 12
Directed Tree a c d b [0] f g e [0] [0] i j [0] [0] Phase 2: Bottom-Up Traversal 13
Directed Tree a c d b [0,1] [0] f g e [0,1,1] [0] [0] i j [0] [0] Phase 2: Bottom-Up Traversal 14
Directed Tree a c d b [0,1] [0] f g e [0,1,2] [0] [0] i j prefix sum [0] [0] Phase 2: Bottom-Up Traversal 15
Directed Tree a c d b [0,1,3] [0,1] [0] f g e [0,1,2] [0] [0] i j [0] [0] Phase 2: Bottom-Up Traversal 16
Directed Tree a c d b [0,1,4] [0,1] [0] f g e [0,1,2] [0] [0] i j prefix sum [0] [0] Phase 2: Bottom-Up Traversal 17
Directed Tree [0,5,1,2] a c d b [0,1,4] [0,1] [0] f g e [0,1,2] [0] [0] i j [0] [0] Phase 2: Bottom-Up Traversal 18
Directed Tree [0,5,6,8] a c d b [0,1,4] [0,1] [0] f g e [0,1,2] [0] [0] i j [0] [0] Phase 2: Bottom-Up Traversal 19
Directed Tree [0,5,6,8] a c d b [0,1,4] [0,1] [0] f g e [0,1,2] [0] [0] i j [0] [0] This phase is done, next phase is about to start … 20
Directed Tree [0,5,6,8] a c d b [0,1,4] [0,1] [0] f g e [0,1,2] offset 0 [0] [0] i j [0] [0] Phase 3: Top-down Traversal 21
Directed Tree [0,5,6,8] a c d b [0,1,4] [0,1] [0] f g e [0,1,2] offset 0 [0] [0] i j offset 1 [0] [0] Phase 3: Top-down Traversal 22
Directed Tree [0,5,6,8] a c d b [0,1,4] [0,1] [0] offset 6 f g e [0,1,2] offset 0 [0] [0] i j offset 1 [0] [0] Phase 3: Top-down Traversal 23
Directed Tree [0,5,6,8] a c d b [0,1,4] [0,1] [0] discovery 6+1 f g e [0,1,2] discovery 0+2 [0] [0] i j discovery 1+3 [0] [0] discovery = offset + depth Phase 3: Top-down Traversal 24
Directed Tree [0,5,6,8] a c d b [0,1,4] [0,1] [0] finish 6+1 f g e [0,1,2] finish 0+0 [0] [0] i j finish 1+0 [0] [0] finish = offset + sub-tree size Phase 3: Top-down Traversal 25
DIRECTED ACYCLIC GRAPHS PATH-BASED VARIANT 26
Path-Based (for DAGs) a c d b f g e i j collision left right [a,b,f] f [a,d,f] Phase 1 27
Path-Based (for DAGs) a c d b f g e i j collision left right • wait until all paths to a node are traversed • align path sequences [a,b,f] f [a,d,f] left [a,b,f] resolution (lexicographically smallest) right [a,d,f] • compare left-to-right and choose smallest Phase 1 28
Path-Based (for DAGs) a c d b f g e i j This phase is done 29
OPTIMIZATIONS 30
Path Pruning a c b e d [a,c,d,f] [a,b,e,f] f 31
Path Pruning When two paths reach the same node a There exists a parent “a” where the path split [a,b ,…] and [ a,c ,…] c b e d [a,c,d,f] [a,b,e,f] f 32
Path Pruning When two paths reach the same node a There exists a parent “a” where the path split [a,b ,…] and [ a,c ,…] c b It is the comparison between “b” and “c” that allows us to distinguish between paths e d [a,c,d,f] [a,b,e,f] f 33
Path Pruning When two paths reach the same node a There exists a parent “a” where the path split [a,b ,…] and [ a,c ,…] c b It is the comparison between “b” and “c” that allows us to distinguish between paths Parent node with a single edge e d will never be a decision point [a,c,d,f] [a,b,e,f] f 34
Path Pruning When two paths reach the same node a There exists a parent “a” where the path split [a,b ,…] and [ a,c ,…] c b It is the comparison between “b” and “c” that allows us to distinguish between paths Parent node with a single edge e d will never be a decision point No need to store nodes with such parents [a,c,f] [a,b,f] f 35
Path Pruning 36
Phase Composition 37
SSSP-BASED VARIANT 38
SSSP-based (for DAGs) a c d b [1] f g e [1] [1] i j [1] [1] Run the algorithm for Directed Trees, but Run the algorithm for Directed Trees, but Propagate # of nodes to all the parents Propagate # of nodes to all the parents Start prefix sum with 1 (instead of 0) Start prefix sum with 1 (instead of 0) Phase 1: Bottom-Up Traversal 39
SSSP-based (for DAGs) a c d b [1,1] [1] f g e [1,1,1] [1] [1] i j [1] [1] Run the algorithm for Directed Trees, but Run the algorithm for Directed Trees, but Propagate # of nodes to all the parents Propagate # of nodes to all the parents Start prefix sum with 1 (instead of 0) Start prefix sum with 1 (instead of 0) Phase 1: Bottom-Up Traversal 40
SSSP-based (for DAGs) a c d b [1,2] [1] f g e [1,2,3] [1] [1] i j prefix sum [1] [1] Run the algorithm for Directed Trees, but Run the algorithm for Directed Trees, but Propagate # of nodes to all the parents Propagate # of nodes to all the parents Start prefix sum with 1 (instead of 0) Start prefix sum with 1 (instead of 0) Phase 1: Bottom-Up Traversal 41
SSSP-based (for DAGs) a c d b [1,1,3] [1,2] [1] f g e [1,2,3] [1] [1] i j [1] [1] Run the algorithm for Directed Trees, but Run the algorithm for Directed Trees, but Propagate # of nodes to all the parents Propagate # of nodes to all the parents Start prefix sum with 1 (instead of 0) Start prefix sum with 1 (instead of 0) Phase 1: Bottom-Up Traversal 42
SSSP-based (for DAGs) a c d b [1,2,4] [1,2] [1] f g e [1,2,3] prefix sum [1] [1] i j [1] [1] Run the algorithm for Directed Trees, but Run the algorithm for Directed Trees, but Propagate # of nodes to all the parents Propagate # of nodes to all the parents Start prefix sum with 1 (instead of 0) Start prefix sum with 1 (instead of 0) Phase 1: Bottom-Up Traversal 43
SSSP-based (for DAGs) [1,5,1,2,1] a c d b [1,2,4] [1,2] [1] f g e [1,2,3] [1] [1] i j [1] [1] Run the algorithm for Directed Trees, but Run the algorithm for Directed Trees, but Propagate # of nodes to all the parents Propagate # of nodes to all the parents Start prefix sum with 1 (instead of 0) Start prefix sum with 1 (instead of 0) Phase 1: Bottom-Up Traversal 44
SSSP-based (for DAGs) [1,6,7,9,10] a c d b [1,2,4] [1,2] [1] f g e [1,2,3] [1] [1] i j [1] [1] Run the algorithm for Directed Trees, but Propagate # of nodes to all the parents Start prefix sum with 1 (instead of 0) Phase 1: Bottom-Up Traversal 45
SSSP-based (for DAGs) a 6 7 1 9 c d b 1 1 2 f g e 1 2 i j Assign # of nodes as the edge weight This phase is done, next phase is about to start … 46
SSSP-based (for DAGs) a 6 7 1 9 c d b 1 1 2 f g e 1 2 i j 1+2+2=5 < 9 Phase 2: Top-down traversal 47
SSSP-based (for DAGs) a 6 7 1 9 c d b 1 1 2 f g e 1 2 i j 1+2+2=5 < 9 Shortest Path is the DFS path Phase 2: Top-down traversal 48
SSSP-based (for DAGs) a c d b f g e i j Phase 2: This phase is done 49
OPTIMIZATIONS 50
Discovery time The length of shortest path a 0 defines an ordering of nodes c d b 1 6 7 f g e 8 3 2 i j 4 5 Phase 3a: Sorting 51
Discovery time The length of shortest path a 0 defines an ordering of nodes We can sort them to obtain c d discovery time b 1 6 7 f g e 8 3 2 i j 4 5 Phase 3a: Sorting 52
Recommend
More recommend