LU Factorization Parallel Algorithms for LU Partial Pivoting Parallel Numerical Algorithms Chapter 3 – Dense Linear Systems Section 3.2 – LU Factorization Michael T. Heath and Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 1 / 44
LU Factorization Parallel Algorithms for LU Partial Pivoting Outline LU Factorization 1 Motivation Gaussian Elimination Parallel Algorithms for LU 2 Fine-Grain Algorithm Agglomeration Schemes Mapping Schemes Scalability Partial Pivoting 3 Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 2 / 44
LU Factorization Motivation Parallel Algorithms for LU Gaussian Elimination Partial Pivoting LU Factorization System of linear algebraic equations has form Ax = b where A is given n × n matrix, b is given n -vector, and x is unknown solution n -vector to be computed Direct method for solving general linear system is by computing LU factorization A = LU where L is unit lower triangular and U is upper triangular Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 3 / 44
LU Factorization Motivation Parallel Algorithms for LU Gaussian Elimination Partial Pivoting LU Factorization System Ax = b then becomes LUx = b Solve lower triangular system Ly = b by forward-substitution to obtain vector y Finally, solve upper triangular system Ux = y by back-substitution to obtain solution x to original system Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 4 / 44
LU Factorization Motivation Parallel Algorithms for LU Gaussian Elimination Partial Pivoting Gaussian Elimination Algorithm LU factorization can be computed by Gaussian elimination as follows, where U overwrites A for k = 1 to n − 1 { loop over columns } for i = k + 1 to n { compute multipliers ℓ ik = a ik /a kk for current column } end for j = k + 1 to n for i = k + 1 to n { apply transformation to a ij = a ij − ℓ ik a kj remaining submatrix } end end end Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 5 / 44
LU Factorization Motivation Parallel Algorithms for LU Gaussian Elimination Partial Pivoting Gaussian Elimination Algorithm In general, row interchanges (pivoting) may be required to ensure existence of LU factorization and numerical stability of Gaussian elimination algorithm, but for simplicity we temporarily ignore this issue Gaussian elimination requires about n 3 / 3 paired additions and multiplications, so model serial time as T 1 = γ n 3 / 3 where γ is time required for multiply-add operation About n 2 / 2 divisions also required, but we ignore this lower-order term Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 6 / 44
LU Factorization Motivation Parallel Algorithms for LU Gaussian Elimination Partial Pivoting Loop Orderings for Gaussian Elimination Gaussian elimination has general form of triple-nested loop in which entries of L and U overwrite those of A for for for a ij = a ij − ( a ik /a kk ) a kj end end end Indices i , j , and k of for loops can be taken in any order, for total of 3! = 6 different ways of arranging loops Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 7 / 44
LU Factorization Motivation Parallel Algorithms for LU Gaussian Elimination Partial Pivoting Loop Orderings for Gaussian Elimination Different loop orders have different memory access patterns, which may cause their performance to vary widely Right-looking orderings (loop over k is outermost) perform updates to the trailing matrix (update all a ij for i, j ≥ k ) eagerly Left-looking orderings (loop over k is innermost) update the trailing matrix lazily (updates to a ij done only when all entries a i ′ j ′ with min( i ′ , j ′ ) < min( i, j ) have been updated) Right-looking ordering achieve better read-locality (the same divisor and outer-product vectors are reused) Left-looking ordering achieve better write-locality (entries of A may be changed in memory only once) Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 8 / 44
LU Factorization Motivation Parallel Algorithms for LU Gaussian Elimination Partial Pivoting Gaussian Elimination Algorithm Right-looking form of Gaussian elimination for k = 1 to n − 1 for i = k + 1 to n ℓ ik = a ik /a kk end for j = k + 1 to n for i = k + 1 to n a ij = a ij − ℓ ik a kj end end end Multipliers ℓ ik computed outside inner loop for greater efficiency Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 9 / 44
Fine-Grain Algorithm LU Factorization Agglomeration Schemes Parallel Algorithms for LU Mapping Schemes Partial Pivoting Scalability Parallel Algorithm Partition For i, j = 1 , . . . , n , fine-grain task ( i, j ) stores a ij and computes and stores � u ij , if i ≤ j ℓ ij , if i > j yielding 2-D array of n 2 fine-grain tasks Communicate Broadcast entries of A vertically to tasks below Broadcast entries of L horizontally to tasks to right Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 10 / 44
Fine-Grain Algorithm LU Factorization Agglomeration Schemes Parallel Algorithms for LU Mapping Schemes Partial Pivoting Scalability Fine-Grain Tasks and Communication a 11 a 12 a 13 a 14 a 15 a 16 u 11 u 12 u 13 u 14 u 15 u 16 a 21 a 22 a 23 a 24 a 25 a 26 ℓ 21 u 22 u 23 u 24 u 25 u 26 a 31 a 32 a 33 a 34 a 35 a 36 ℓ 31 ℓ 32 u 33 u 34 u 35 u 36 a 41 a 42 a 43 a 44 a 45 a 46 ℓ 41 ℓ 42 ℓ 43 u 44 u 45 u 46 a 51 a 52 a 53 a 54 a 55 a 56 ℓ 51 ℓ 52 ℓ 53 ℓ 54 u 55 u 56 a 61 a 62 a 63 a 64 a 65 a 66 ℓ 61 ℓ 62 ℓ 63 ℓ 64 ℓ 65 u 66 Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 11 / 44
Fine-Grain Algorithm LU Factorization Agglomeration Schemes Parallel Algorithms for LU Mapping Schemes Partial Pivoting Scalability Fine-Grain Parallel Algorithm for k = 1 to min( i, j ) − 1 recv broadcast of a kj from task ( k, j ) { vert bcast } recv broadcast of ℓ ik from task ( i, k ) { horiz bcast } a ij = a ij − ℓ ik a kj { update entry } end if i ≤ j then broadcast a ij to tasks ( k, j ) , k = i + 1 , . . . , n { vert bcast } else recv broadcast of a jj from task ( j, j ) { vert bcast } ℓ ij = a ij /a jj { multiplier } broadcast ℓ ij to tasks ( i, k ) , k = j + 1 , . . . , n { horiz bcast } end Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 12 / 44
Fine-Grain Algorithm LU Factorization Agglomeration Schemes Parallel Algorithms for LU Mapping Schemes Partial Pivoting Scalability Agglomeration Agglomerate With n × n array of fine-grain tasks, natural strategies are 2-D: combine k × k subarray of fine-grain tasks to form each coarse-grain task, yielding ( n/k ) 2 coarse-grain tasks 1-D column: combine n fine-grain tasks in each column into coarse-grain task, yielding n coarse-grain tasks 1-D row: combine n fine-grain tasks in each row into coarse-grain task, yielding n coarse-grain tasks Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 13 / 44
Fine-Grain Algorithm LU Factorization Agglomeration Schemes Parallel Algorithms for LU Mapping Schemes Partial Pivoting Scalability 2-D Agglomeration a 11 a 12 a 13 a 14 a 15 a 16 u 11 u 12 u 13 u 14 u 15 u 16 a 21 a 22 a 23 a 24 a 25 a 26 ℓ 21 u 22 u 23 u 24 u 25 u 26 a 31 a 32 a 33 a 34 a 35 a 36 ℓ 31 ℓ 32 u 33 u 34 u 35 u 36 a 41 a 42 a 43 a 44 a 45 a 46 ℓ 41 ℓ 42 ℓ 43 u 44 u 45 u 46 a 51 a 52 a 53 a 54 a 55 a 56 ℓ 51 ℓ 52 ℓ 53 ℓ 54 u 55 u 56 a 61 a 62 a 63 a 64 a 65 a 66 ℓ 61 ℓ 62 ℓ 63 ℓ 64 ℓ 65 u 66 Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 14 / 44
Fine-Grain Algorithm LU Factorization Agglomeration Schemes Parallel Algorithms for LU Mapping Schemes Partial Pivoting Scalability Blocked LU factorization A Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 15 / 44
Fine-Grain Algorithm LU Factorization Agglomeration Schemes Parallel Algorithms for LU Mapping Schemes Partial Pivoting Scalability Blocked LU factorization U ₀₀ L ₀₀ Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 16 / 44
Fine-Grain Algorithm LU Factorization Agglomeration Schemes Parallel Algorithms for LU Mapping Schemes Partial Pivoting Scalability Blocked LU factorization U L Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 17 / 44
Fine-Grain Algorithm LU Factorization Agglomeration Schemes Parallel Algorithms for LU Mapping Schemes Partial Pivoting Scalability Blocked LU factorization U L S=A-LU Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 18 / 44
Fine-Grain Algorithm LU Factorization Agglomeration Schemes Parallel Algorithms for LU Mapping Schemes Partial Pivoting Scalability Coarse-Grain 2-D Parallel Algorithm for k = 1 to n − 1 broadcast { a kj : j ∈ mycols , j ≥ k } in processor column if k ∈ mycols then for i ∈ myrows , i > k ℓ ik = a ik /a kk { multipliers } end end broadcast { ℓ ik : i ∈ myrows , i > k } in processor row for j ∈ mycols , j > k for i ∈ myrows , i > k , a ij = a ij − ℓ ik a kj { update } end end end Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 19 / 44
Recommend
More recommend