Triangular Systems 1D Algorithms Wavefront Algorithms 2D Algorithms and TRSM Parallel Numerical Algorithms Chapter 3 – Dense Linear Systems Section 3.3 – Triangular Linear Systems Michael T. Heath and Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 1 / 40
Triangular Systems 1D Algorithms Wavefront Algorithms 2D Algorithms and TRSM Outline Triangular Systems 1 1D Algorithms 2 Wavefront Algorithms 3 2D Algorithms and TRSM 4 Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 2 / 40
Triangular Systems 1D Algorithms Wavefront Algorithms 2D Algorithms and TRSM Triangular Matrices Matrix L is lower triangular if all entries above its main diagonal are zero, ℓ ij = 0 for i < j Matrix U is upper triangular if all entries below its main diagonal are zero, u ij = 0 for i > j Triangular matrices are important because triangular linear systems are easily solved by successive substitution Most direct methods for solving general linear systems first reduce matrix to triangular form and then solve resulting equivalent triangular system(s) Triangular systems are also frequently used as preconditioners in iterative methods for solving linear systems Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 3 / 40
Triangular Systems 1D Algorithms Wavefront Algorithms 2D Algorithms and TRSM Forward Substitution For lower triangular system Lx = b , solution can be obtained by forward substitution i − 1 � � � x i = b i − ℓ ij x j /ℓ ii , i = 1 , . . . , n j =1 for j = 1 to n x j = b j /ℓ jj { compute soln component } for i = j + 1 to n b i = b i − ℓ ij x j { update right-hand side } end end Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 4 / 40
Triangular Systems 1D Algorithms Wavefront Algorithms 2D Algorithms and TRSM Back Substitution For upper triangular system Ux = b , solution can be obtained by back substitution n � � � x i = b i − u ij x j /u ii , i = n, . . . , 1 j = i +1 for j = n to 1 x j = b j /u jj { compute soln component } for i = 1 to j − 1 b i = b i − u ij x j { update right-hand side } end end Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 5 / 40
Triangular Systems 1D Algorithms Wavefront Algorithms 2D Algorithms and TRSM Solving Triangular Systems Forward or back substitution requires about n 2 / 2 multiplications and similar number of additions, so serial exeuction time is T 1 = Θ( γn 2 ) We will consider only lower triangular systems, as analogous algorithms for upper triangular systems are similar The depth of triangular solve is D = Θ( n ) , so the maximum speed-up is T 1 /D = Θ( n ) Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 6 / 40
Triangular Systems 1D Algorithms Wavefront Algorithms 2D Algorithms and TRSM Loop Orderings for Forward Substitution for j = 1 to n for i = 1 to n x j = b j /ℓ jj for j = 1 to i − 1 for i = j + 1 to n b i = b i − ℓ ij x j b i = b i − ℓ ij x j end end x i = b i /ℓ ii end end right-looking left-looking immediate-update delayed-update data-driven demand-driven fan-out fan-in Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 7 / 40
Triangular Systems 1D Algorithms Fine-Grain Algorithm Wavefront Algorithms 2D Algorithms and TRSM Parallel Algorithm Partition For i = 2 , . . . , n , j = 1 , . . . , i − 1 , fine-grain task ( i, j ) stores ℓ ij and computes product ℓ ij x j For i = 1 , . . . , n , fine-grain task ( i, i ) stores ℓ ii and b i , collects sum t i = � i − 1 j =1 ℓ ij x j , and computes and stores x i = ( b i − t i ) /ℓ ii yielding 2-D triangular array of n ( n + 1) / 2 fine-grain tasks Communicate For j = 1 , . . . , n − 1 , task ( j, j ) broadcasts x j to tasks ( i, j ) , i = j + 1 , . . . , n For i = 2 , . . . , n , sum reduction of products ℓ ij x j across tasks ( i, j ) , j = 1 , . . . , i , with task ( i, i ) as root Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 8 / 40
Triangular Systems 1D Algorithms Fine-Grain Algorithm Wavefront Algorithms 2D Algorithms and TRSM Fine-Grain Tasks and Communication ℓ 11 b 1 x 1 ℓ 22 ℓ 21 b 2 x 2 ℓ 33 ℓ 31 ℓ 32 b 3 x 3 ℓ 44 ℓ 41 ℓ 42 ℓ 43 b 4 x 4 ℓ 55 ℓ 51 ℓ 52 ℓ 53 ℓ 54 b 5 x 5 ℓ 66 ℓ 61 ℓ 62 ℓ 63 ℓ 64 ℓ 65 b 6 x 6 Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 9 / 40
Triangular Systems 1D Algorithms Fine-Grain Algorithm Wavefront Algorithms 2D Algorithms and TRSM Fine-Grain Parallel Algorithm if i = j then t = 0 if i > 1 then recv sum reduction of t across tasks ( i, k ) , k = 1 , . . . , i end x i = ( b i − t ) /ℓ ii broadcast x i to tasks ( k, i ) , k = i + 1 , . . . , n else recv broadcast of x j from task ( j, j ) t = ℓ ij x j reduce t across tasks ( i, k ) , k = 1 , . . . , i end Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 10 / 40
Triangular Systems 1D Algorithms Fine-Grain Algorithm Wavefront Algorithms 2D Algorithms and TRSM Fine-Grain Algorithm If communication is suitably pipelined, then fine-grain algorithm can achieve Θ( n ) execution time, but uses Θ( n 2 ) tasks, so it is inefficient If there are multiple right-hand-side vectors b , then successive solutions can be pipelined to increase overall efficiency Agglomerating fine-grain tasks yields more reasonable number of tasks and improves ratio of computation to communication Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 11 / 40
Triangular Systems 1D Algorithms Fine-Grain Algorithm Wavefront Algorithms 2D Algorithms and TRSM Agglomeration Agglomerate With n × n array of fine-grain tasks, natural strategies are 2-D: combine k × k subarray of fine-grain tasks to form each coarse-grain task, yielding ( n/k ) 2 coarse-grain tasks 1-D column: combine n fine-grain tasks in each column into coarse-grain task, yielding n coarse-grain tasks 1-D row: combine n fine-grain tasks in each row into coarse-grain task, yielding n coarse-grain tasks Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 12 / 40
Triangular Systems 1D Algorithms Fine-Grain Algorithm Wavefront Algorithms 2D Algorithms and TRSM 2-D Agglomeration ℓ 11 b 1 x 1 ℓ 22 ℓ 21 b 2 x 2 ℓ 33 ℓ 31 ℓ 32 b 3 x 3 ℓ 44 ℓ 41 ℓ 42 ℓ 43 b 4 x 4 ℓ 55 ℓ 51 ℓ 52 ℓ 53 ℓ 54 b 5 x 5 ℓ 66 ℓ 61 ℓ 62 ℓ 63 ℓ 64 ℓ 65 b 6 x 6 Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 13 / 40
Triangular Systems 1D Algorithms Fine-Grain Algorithm Wavefront Algorithms 2D Algorithms and TRSM 1-D Column Agglomeration ℓ 11 b 1 x 1 ℓ 22 ℓ 21 b 2 x 2 ℓ 33 ℓ 31 ℓ 32 b 3 x 3 ℓ 44 ℓ 41 ℓ 42 ℓ 43 b 4 x 4 ℓ 55 ℓ 51 ℓ 52 ℓ 53 ℓ 54 b 5 x 5 ℓ 66 ℓ 61 ℓ 62 ℓ 63 ℓ 64 ℓ 65 b 6 x 6 Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 14 / 40
Triangular Systems 1D Algorithms Fine-Grain Algorithm Wavefront Algorithms 2D Algorithms and TRSM 1-D Row Agglomeration ℓ 11 b 1 x 1 ℓ 22 ℓ 21 b 2 x 2 ℓ 33 ℓ 31 ℓ 32 b 3 x 3 ℓ 44 ℓ 41 ℓ 42 ℓ 43 b 4 x 4 ℓ 55 ℓ 51 ℓ 52 ℓ 53 ℓ 54 b 5 x 5 ℓ 66 ℓ 61 ℓ 62 ℓ 63 ℓ 64 ℓ 65 b 6 x 6 Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 15 / 40
Triangular Systems 1D Algorithms Fine-Grain Algorithm Wavefront Algorithms 2D Algorithms and TRSM Mapping Map 2-D: assign ( n/k ) 2 /p coarse-grain tasks to each of p processors using any desired mapping in each dimension, treating target network as 2-D mesh 1-D: assign n/p coarse-grain tasks to each of p processors using any desired mapping, treating target network as 1-D mesh Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 16 / 40
Triangular Systems 1D Algorithms Fine-Grain Algorithm Wavefront Algorithms 2D Algorithms and TRSM 1-D Column Agglomeration, Block Mapping ℓ 11 b 1 x 1 ℓ 22 ℓ 21 b 2 x 2 ℓ 33 ℓ 31 ℓ 32 b 3 x 3 ℓ 44 ℓ 41 ℓ 42 ℓ 43 b 4 x 4 ℓ 55 ℓ 51 ℓ 52 ℓ 53 ℓ 54 b 5 x 5 ℓ 66 ℓ 61 ℓ 62 ℓ 63 ℓ 64 ℓ 65 b 6 x 6 Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 17 / 40
Triangular Systems 1D Algorithms Fine-Grain Algorithm Wavefront Algorithms 2D Algorithms and TRSM 1-D Column Agglomeration, Cyclic Mapping ℓ 11 b 1 x 1 ℓ 22 ℓ 21 b 2 x 2 ℓ 33 ℓ 31 ℓ 32 b 3 x 3 ℓ 44 ℓ 41 ℓ 42 ℓ 43 b 4 x 4 ℓ 55 ℓ 51 ℓ 54 ℓ 52 ℓ 53 b 5 x 5 ℓ 66 ℓ 61 ℓ 64 ℓ 62 ℓ 65 ℓ 63 b 6 x 6 Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 18 / 40
Triangular Systems 1D Algorithms Fine-Grain Algorithm Wavefront Algorithms 2D Algorithms and TRSM 1-D Aggregation with Block-Cyclic Mapping Cost With block-size b , 1D partitioning requires n/b broadcasts of b items for row-agglomeration requires n/b reductions of b items for column-agglomeration in both cases O ( nb/p + b 2 ) work must be done to solve for b entries of x between each of the n/b collectives The overall execution time is � � α ( n/b ) log( p ) + βn + γ ( n 2 /p + nb ) T p ( n, b ) = Θ Selecting block-size b = n/p , parallel execution time is � � αp log( p ) + βn + γn 2 /p T p ( n, n/p ) = Θ Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 19 / 40
Recommend
More recommend