Parallel Numerical Algorithms Chapter 3 Dense Linear Systems - PowerPoint PPT Presentation

LU Factorization Parallel Algorithms for LU Partial Pivoting Parallel Numerical Algorithms Chapter 3 – Dense Linear Systems Section 3.2 – LU Factorization Michael T. Heath and Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 1 / 44

LU Factorization Parallel Algorithms for LU Partial Pivoting Outline LU Factorization 1 Motivation Gaussian Elimination Parallel Algorithms for LU 2 Fine-Grain Algorithm Agglomeration Schemes Mapping Schemes Scalability Partial Pivoting 3 Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 2 / 44

LU Factorization Motivation Parallel Algorithms for LU Gaussian Elimination Partial Pivoting LU Factorization System of linear algebraic equations has form Ax = b where A is given n × n matrix, b is given n -vector, and x is unknown solution n -vector to be computed Direct method for solving general linear system is by computing LU factorization A = LU where L is unit lower triangular and U is upper triangular Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 3 / 44

LU Factorization Motivation Parallel Algorithms for LU Gaussian Elimination Partial Pivoting LU Factorization System Ax = b then becomes LUx = b Solve lower triangular system Ly = b by forward-substitution to obtain vector y Finally, solve upper triangular system Ux = y by back-substitution to obtain solution x to original system Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 4 / 44

LU Factorization Motivation Parallel Algorithms for LU Gaussian Elimination Partial Pivoting Gaussian Elimination Algorithm LU factorization can be computed by Gaussian elimination as follows, where U overwrites A for k = 1 to n − 1 { loop over columns } for i = k + 1 to n { compute multipliers ℓ ik = a ik /a kk for current column } end for j = k + 1 to n for i = k + 1 to n { apply transformation to a ij = a ij − ℓ ik a kj remaining submatrix } end end end Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 5 / 44

LU Factorization Motivation Parallel Algorithms for LU Gaussian Elimination Partial Pivoting Gaussian Elimination Algorithm In general, row interchanges (pivoting) may be required to ensure existence of LU factorization and numerical stability of Gaussian elimination algorithm, but for simplicity we temporarily ignore this issue Gaussian elimination requires about n 3 / 3 paired additions and multiplications, so model serial time as T 1 = γ n 3 / 3 where γ is time required for multiply-add operation About n 2 / 2 divisions also required, but we ignore this lower-order term Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 6 / 44

LU Factorization Motivation Parallel Algorithms for LU Gaussian Elimination Partial Pivoting Loop Orderings for Gaussian Elimination Gaussian elimination has general form of triple-nested loop in which entries of L and U overwrite those of A for for for a ij = a ij − ( a ik /a kk ) a kj end end end Indices i , j , and k of for loops can be taken in any order, for total of 3! = 6 different ways of arranging loops Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 7 / 44

LU Factorization Motivation Parallel Algorithms for LU Gaussian Elimination Partial Pivoting Loop Orderings for Gaussian Elimination Different loop orders have different memory access patterns, which may cause their performance to vary widely Right-looking orderings (loop over k is outermost) perform updates to the trailing matrix (update all a ij for i, j ≥ k ) eagerly Left-looking orderings (loop over k is innermost) update the trailing matrix lazily (updates to a ij done only when all entries a i ′ j ′ with min( i ′ , j ′ ) < min( i, j ) have been updated) Right-looking ordering achieve better read-locality (the same divisor and outer-product vectors are reused) Left-looking ordering achieve better write-locality (entries of A may be changed in memory only once) Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 8 / 44

LU Factorization Motivation Parallel Algorithms for LU Gaussian Elimination Partial Pivoting Gaussian Elimination Algorithm Right-looking form of Gaussian elimination for k = 1 to n − 1 for i = k + 1 to n ℓ ik = a ik /a kk end for j = k + 1 to n for i = k + 1 to n a ij = a ij − ℓ ik a kj end end end Multipliers ℓ ik computed outside inner loop for greater efficiency Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 9 / 44

Fine-Grain Algorithm LU Factorization Agglomeration Schemes Parallel Algorithms for LU Mapping Schemes Partial Pivoting Scalability Parallel Algorithm Partition For i, j = 1 , . . . , n , fine-grain task ( i, j ) stores a ij and computes and stores � u ij , if i ≤ j ℓ ij , if i > j yielding 2-D array of n 2 fine-grain tasks Communicate Broadcast entries of A vertically to tasks below Broadcast entries of L horizontally to tasks to right Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 10 / 44

Fine-Grain Algorithm LU Factorization Agglomeration Schemes Parallel Algorithms for LU Mapping Schemes Partial Pivoting Scalability Fine-Grain Tasks and Communication a 11 a 12 a 13 a 14 a 15 a 16 u 11 u 12 u 13 u 14 u 15 u 16 a 21 a 22 a 23 a 24 a 25 a 26 ℓ 21 u 22 u 23 u 24 u 25 u 26 a 31 a 32 a 33 a 34 a 35 a 36 ℓ 31 ℓ 32 u 33 u 34 u 35 u 36 a 41 a 42 a 43 a 44 a 45 a 46 ℓ 41 ℓ 42 ℓ 43 u 44 u 45 u 46 a 51 a 52 a 53 a 54 a 55 a 56 ℓ 51 ℓ 52 ℓ 53 ℓ 54 u 55 u 56 a 61 a 62 a 63 a 64 a 65 a 66 ℓ 61 ℓ 62 ℓ 63 ℓ 64 ℓ 65 u 66 Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 11 / 44

Fine-Grain Algorithm LU Factorization Agglomeration Schemes Parallel Algorithms for LU Mapping Schemes Partial Pivoting Scalability Fine-Grain Parallel Algorithm for k = 1 to min( i, j ) − 1 recv broadcast of a kj from task ( k, j ) { vert bcast } recv broadcast of ℓ ik from task ( i, k ) { horiz bcast } a ij = a ij − ℓ ik a kj { update entry } end if i ≤ j then broadcast a ij to tasks ( k, j ) , k = i + 1 , . . . , n { vert bcast } else recv broadcast of a jj from task ( j, j ) { vert bcast } ℓ ij = a ij /a jj { multiplier } broadcast ℓ ij to tasks ( i, k ) , k = j + 1 , . . . , n { horiz bcast } end Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 12 / 44

Fine-Grain Algorithm LU Factorization Agglomeration Schemes Parallel Algorithms for LU Mapping Schemes Partial Pivoting Scalability Agglomeration Agglomerate With n × n array of fine-grain tasks, natural strategies are 2-D: combine k × k subarray of fine-grain tasks to form each coarse-grain task, yielding ( n/k ) 2 coarse-grain tasks 1-D column: combine n fine-grain tasks in each column into coarse-grain task, yielding n coarse-grain tasks 1-D row: combine n fine-grain tasks in each row into coarse-grain task, yielding n coarse-grain tasks Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 13 / 44

Fine-Grain Algorithm LU Factorization Agglomeration Schemes Parallel Algorithms for LU Mapping Schemes Partial Pivoting Scalability 2-D Agglomeration a 11 a 12 a 13 a 14 a 15 a 16 u 11 u 12 u 13 u 14 u 15 u 16 a 21 a 22 a 23 a 24 a 25 a 26 ℓ 21 u 22 u 23 u 24 u 25 u 26 a 31 a 32 a 33 a 34 a 35 a 36 ℓ 31 ℓ 32 u 33 u 34 u 35 u 36 a 41 a 42 a 43 a 44 a 45 a 46 ℓ 41 ℓ 42 ℓ 43 u 44 u 45 u 46 a 51 a 52 a 53 a 54 a 55 a 56 ℓ 51 ℓ 52 ℓ 53 ℓ 54 u 55 u 56 a 61 a 62 a 63 a 64 a 65 a 66 ℓ 61 ℓ 62 ℓ 63 ℓ 64 ℓ 65 u 66 Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 14 / 44

Fine-Grain Algorithm LU Factorization Agglomeration Schemes Parallel Algorithms for LU Mapping Schemes Partial Pivoting Scalability Blocked LU factorization A Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 15 / 44

Fine-Grain Algorithm LU Factorization Agglomeration Schemes Parallel Algorithms for LU Mapping Schemes Partial Pivoting Scalability Blocked LU factorization U ₀₀ L ₀₀ Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 16 / 44

Fine-Grain Algorithm LU Factorization Agglomeration Schemes Parallel Algorithms for LU Mapping Schemes Partial Pivoting Scalability Blocked LU factorization U L Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 17 / 44

Fine-Grain Algorithm LU Factorization Agglomeration Schemes Parallel Algorithms for LU Mapping Schemes Partial Pivoting Scalability Blocked LU factorization U L S=A-LU Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 18 / 44

Fine-Grain Algorithm LU Factorization Agglomeration Schemes Parallel Algorithms for LU Mapping Schemes Partial Pivoting Scalability Coarse-Grain 2-D Parallel Algorithm for k = 1 to n − 1 broadcast { a kj : j ∈ mycols , j ≥ k } in processor column if k ∈ mycols then for i ∈ myrows , i > k ℓ ik = a ik /a kk { multipliers } end end broadcast { ℓ ik : i ∈ myrows , i > k } in processor row for j ∈ mycols , j > k for i ∈ myrows , i > k , a ij = a ij − ℓ ik a kj { update } end end end Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 19 / 44

Parallel Numerical Algorithms Chapter 3 Dense Linear Systems - PowerPoint PPT Presentation

LU Factorization Parallel Algorithms for LU Partial Pivoting Parallel Numerical Algorithms Chapter 3 Dense Linear Systems Section 3.2 LU Factorization Michael T. Heath and Edgar Solomonik Department of Computer Science University of

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.2 Parallel

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.1 Parallel Algorithm

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.3 Parallel

+ Design of Parallel Algorithms Models of Parallel Computation + Chapter Overview: Algorithms

Parallel Algorithms Parallel Algorithms Examples Examples Concepts & Definitions

+ Design of Parallel Algorithms Parallel Algorithm Analysis Tools + Topic Overview n Sources of

+ Design of Parallel Algorithms Parallel Algorithm Analysis Tools + Topic Overview n Sources

Parallel Numerical Algorithms Chapter 3 Dense Linear Systems Section 3.3 Triangular

Parallel Numerical Algorithms Chapter 3 Dense Linear Systems Section 3.3 Triangular

+ Design of Parallel Algorithms Parallel Sorting Algorithms + Topic Overview n Issues in

Parallel Numerical Algorithms Chapter 1 Parallel Computing Michael T. Heath and Edgar

Parallel Numerical Algorithms Chapter 1 Parallel Computing Michael T. Heath and Edgar

Parallel Numerical Algorithms for Heterogeneous Parallel Computers Antonio M. Vidal Maci a

+ Design of Parallel Algorithms Parallel Dense Matrix Algorithms + Topic Overview n

Introduction to Parallel Computing George Karypis Principles of Parallel Algorithm Design

Overview Why Parallel Sorting? Parallel Quicksort Bitonic Sort Parallel Merge Sort

Decomposition towers and their forcing Alexander Blokh , Michal Misiurewicz

New Results on Permutation Arrays Ivan Hal Sudborough University of Texas at Dallas (joint work

Random permutations (and beyond) Permutation Patterns 2012 Jennie Hansen (Heriot-Watt) 1 Joint

Knots and Permutations Chaim Even-Zohar , Joel Hass, Nati Linial, Tahl Nowik Knots Unknot Trefoil

A polynomial upper bound on Reidemeister moves for each knot type Marc Lackenby August 2013

Daniel Gildea Giorgio Satta University of Rochester Universit di Padova Synchronous CFG

Correlation Decay from Cyclic Polymorphisms Jonah Brown-Cohen 1 Prasad Raghavendra 1 1 UC Berkeley

Enumeration of Pin-Permutations Fr ed erique Bassino Mathilde Bouvel Dominique Rossin ees

Parallel Numerical Algorithms Chapter 3 Dense Linear Systems - PowerPoint PPT Presentation

LU Factorization Parallel Algorithms for LU Partial Pivoting Parallel Numerical Algorithms Chapter 3 Dense Linear Systems Section 3.2 LU Factorization Michael T. Heath and Edgar Solomonik Department of Computer Science University of

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.2 Parallel

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.1 Parallel Algorithm

Parallel Numerical Algorithms Chapter 2 Parallel Thinking Section 2.3 Parallel

+ Design of Parallel Algorithms Models of Parallel Computation + Chapter Overview: Algorithms

Parallel Algorithms Parallel Algorithms Examples Examples Concepts &amp; Definitions

+ Design of Parallel Algorithms Parallel Algorithm Analysis Tools + Topic Overview n Sources of

+ Design of Parallel Algorithms Parallel Algorithm Analysis Tools + Topic Overview n Sources

Parallel Numerical Algorithms Chapter 3 Dense Linear Systems Section 3.3 Triangular

Parallel Numerical Algorithms Chapter 3 Dense Linear Systems Section 3.3 Triangular

+ Design of Parallel Algorithms Parallel Sorting Algorithms + Topic Overview n Issues in

Parallel Numerical Algorithms Chapter 1 Parallel Computing Michael T. Heath and Edgar

Parallel Numerical Algorithms Chapter 1 Parallel Computing Michael T. Heath and Edgar

Parallel Numerical Algorithms for Heterogeneous Parallel Computers Antonio M. Vidal Maci a

+ Design of Parallel Algorithms Parallel Dense Matrix Algorithms + Topic Overview n

Introduction to Parallel Computing George Karypis Principles of Parallel Algorithm Design

Overview Why Parallel Sorting? Parallel Quicksort Bitonic Sort Parallel Merge Sort

Decomposition towers and their forcing Alexander Blokh , Michal Misiurewicz

New Results on Permutation Arrays Ivan Hal Sudborough University of Texas at Dallas (joint work

Random permutations (and beyond) Permutation Patterns 2012 Jennie Hansen (Heriot-Watt) 1 Joint

Knots and Permutations Chaim Even-Zohar , Joel Hass, Nati Linial, Tahl Nowik Knots Unknot Trefoil

A polynomial upper bound on Reidemeister moves for each knot type Marc Lackenby August 2013

Daniel Gildea Giorgio Satta University of Rochester Universit di Padova Synchronous CFG

Correlation Decay from Cyclic Polymorphisms Jonah Brown-Cohen 1 Prasad Raghavendra 1 1 UC Berkeley

Enumeration of Pin-Permutations Fr ed erique Bassino Mathilde Bouvel Dominique Rossin ees

Parallel Algorithms Parallel Algorithms Examples Examples Concepts & Definitions