Use of parallel matrix algorithms for Laplace partial differential - PDF document

✬ ✩ CS140 V-1 Use of parallel matrix algorithms for Laplace partial differential equations A steady-state heat-flow problem on a rectangular 10 cm × 20 cm metal sheet. One edge maintains temperature of 100 degree, other three edges maintain 0 degree. What are the steady-state temperatures at interior points? Temperature 0 0 0 10cm u11 u21 u31 0 100 x 0 0 20cm 0 Temperature ✫ ✪ CS, UCSB Tao Yang

✬ ✩ CS140 V-2 The mathematical model Laplace equation: ∂ 2 U ( x, y ) + ∂ 2 u ( x, y ) = 0 ∂x 2 ∂y 2 with the boundary condition: u ( x, 0) = 0 , u ( x, 10) = 0 . u (0 , y ) = 0 , u (20 , y ) = 100 . Finite difference method to solve this PDE: • Discretize the region: Divide the function domain into a grid with gap h at each axis. • At each point ( ih, jh ), let u ( ih, jh ) = u i,j . Setup a linear equation using an approximated formula for numerical differentiation . • Solve the linear equations to find values of all points u i,j . ✫ ✪ CS, UCSB Tao Yang

✬ ✩ CS140 V-3 Approximating numerical differentiation f ′ ( x ) ≈ f ( x + h ) − f ( x ) or f ′ ( x ) ≈ f ( x ) − f ( x − h ) h h f ( x + h ) − f ( x ) + f ( x ) − f ( x − h ) f ′′ ( x ) ≈ f ′ ( x + h ) − f ′ ( x ) h h ≈ h h Thus f ′′ ( x ) ≈ f ( x + h ) + f ( x − h ) − 2 f ( x ) h 2 Then ∂ 2 u ( x i , y i ) ≈ u i +1 ,j − 2 u i,j + u i − 1 ,j ∂x 2 h 2 ∂ 2 u ( x i , y i ) ≈ u i,j +1 − 2 u i,j + u i,j − 1 ∂y 2 h 2 Adding the above two equations u i +1 ,j − 2 u ij + u i − 1 ,j + u i,j +1 − 2 u i,j + u i,j − 1 = 0 Then 4 u i,j − u i +1 ,j − u i − 1 ,j − u i,j +1 − u i,j − 1 = 0 ✫ ✪ CS, UCSB Tao Yang

✬ ✩ CS140 V-4 Example of Derived Linear Heat Equations Temperature 0 0 0 10cm u11 u21 u31 0 100 x 0 0 20cm 0 Temperature For this case: Let u 11 = x 1 , u 21 = x 2 , u 31 = x 3 . At u 11 , 4 x 1 − 0 − 0 − x 2 = 0 At u 21 , 4 x 2 − x 1 − 0 − x 3 − 0 = 0 At u 31 , 4 x 3 − x 2 − 0 − 100 − 0 = 0       4 − 1 0 x 1 0        = − 1 4 − 1 x 2 0                  0 − 1 4 x 3 100 Solutions: x 1 = 1 . 786 , x 2 = 7 . 143 , x 3 = 26 . 786 ✫ ✪ CS, UCSB Tao Yang

✬ ✩ CS140 V-5 Linear heat equations for a general 2D grid Given a general ( n + 2) × ( n + 2) grid, we have n 2 equations: 4 u i,j − u i +1 ,j − u i − 1 ,j − u i,j +1 − u i,j − 1 = 0 for 1 ≤ i, j ≤ n . Or express them as: u i,j = ( u i +1 ,j + u i − 1 ,j + u i,j +1 + u i,j − 1 ) / 4 Example, r = 2 , n = 6. Temperature held at U 0 Temperature Temperature held at U held at U1 0 Temperature held at U ✫ ✪ 0 CS, UCSB Tao Yang

✬ ✩ CS140 V-6 We order the unknowns as ( u 11 , u 12 , · · · , u 1 n , u 21 , u 22 , · · · , u 2 n , · · · , u n 1 , · · · , u nn ) For n = 2, the ordering is:     x 1 u 11         x 2 u 12     =         x 3 u 21             x 4 u 22 The matrix is:       4 − 1 − 1 0 x 1 u 01 + u 10             − 1 4 0 − 1 x 2 u 20 + u 31       =             − 1 0 4 − 1 x 3 u 02 + u 13                   0 − 1 − 1 4 x 4 u 32 + u 23 ✫ ✪ CS, UCSB Tao Yang

✬ ✩ CS140 V-7 In general, the left side matrix is:   T − I    − I T − I        − I T − I     ... ... ...         − I T n 2 × n 2   4 − 1    − 1 4 − 1        T = − 1 4 − 1     ... ... ...         − 1 4 n × n ✫ ✪ CS, UCSB Tao Yang

✬ ✩ CS140 V-8   1    1        I = 1     ...         1 n × n The matrix is too sparse, direct methods for solving this system takes too much time. ✫ ✪ CS, UCSB Tao Yang

✬ ✩ CS140 V-9 The Jacobi Iterative Method Given 4 u i,j − u i +1 ,j − u i − 1 ,j − u i,j +1 − u i,j − 1 = 0 for 1 ≤ i, j ≤ n . The Jacobi program: Repeat For i=1 to n For j=1 to n u new = 0 . 25( u i +1 ,j + u i − 1 ,j + u i,j +1 + u i,j − 1 ). i,j EndFor EndFor Until � u new − u ij � < ǫ ij Called 5-point stencil computation as u i,j depends on 4 neighbors. ✫ ✪ CS, UCSB Tao Yang

✬ ✩ CS140 V-10 The Gauss-Seidel Method Repeat u old = u . For i=1 to n For j=1 to n u i,j = 0 . 25( u i +1 ,j + u i − 1 ,j + u i,j +1 + u i,j − 1 ). EndFor EndFor Until � u ij − u old ij � < ǫ ✫ ✪ CS, UCSB Tao Yang

✬ ✩ CS140 V-11 Parallel Jacobi Method Assume we have a mesh of n × n processors. Assign u i,j to processor p i,j . The SPMD Jacobi program at processor p i,j : Repeat Collect data from four neighbors: u i +1 ,j , u i − 1 ,j , u i,j +1 , u i,j − 1 from p i +1 ,j , p i − 1 ,j , p i,j +1 , p i,j − 1 . u new = 0 . 25( u i +1 ,j + u i − 1 ,j + u i,j +1 + u i,j − 1 ). i,j diff i,j = | u new − u ij | ij Do a global reduction to get the maximum of diff i,j as M . Until M < ǫ ✫ ✪ CS, UCSB Tao Yang

✬ ✩ CS140 V-12 Performance evaluation • Each computation step takes ω = 5 operations. • There are 4 communication messages to be received. Assume sequential receiving. Communication costs 4( α + β ). • Assume that the global reduction takes ( α + β ) log n . • The sequential time Seq = Kωn 2 where K is the number of steps. • Assume ω = 0 . 5 , β = 0 . 1 , α = 100 , n = 500 , p 2 = 2500. • The parallel time PT = K ( ω + (4 + log n )( α + β )) ω ∗ n 2 Speedup = ω + (4 + log n )( α + β ) ≈ 192 Efficiency = Speedup = 7 . 7% . n 2 ✫ ✪ CS, UCSB Tao Yang

✬ ✩ CS140 V-13 Grid partitioning • Reduce the number of processors. Increase the granularity of computations. • Map the n × n grid to processors using 2D block method. Assume a p × p mesh, γ = n p . Example, r = 2 , n = 6. Temperature held at U 0 Temperature Temperature held at U held at U1 0 Temperature held at U 0 ✫ ✪ CS, UCSB Tao Yang

✬ ✩ CS140 V-14 Code partitioning Re-write the kernel part of the sequential code as: For bi = 1 to p For bj = 1 to p For i = ( b i − 1) γ + 1 to b i γ For j = ( b j − 1) γ + 1 to b j γ u new = 0 . 25( u i +1 ,j + u i − 1 ,j + u i,j +1 + u i,j − 1 ). i,j EndFor EndFor EndFor EndFor ✫ ✪ CS, UCSB Tao Yang

✬ ✩ CS140 V-15 Parallel SPMD code On processor p b i ,b j : Repeat Collect the data from its four neighbors. For i = ( b i − 1) γ + 1 to b i γ For j = ( b j − 1) γ + 1 to b j γ u new = 0 . 25( u i +1 ,j + u i − 1 ,j + u i,j +1 + u i,j − 1 ). i,j EndFor EndFor Compute the local maximum diff b i ,b j for the difference between old values and new values. Do a global reduction to get the maximum diff b i ,b j as M . Until M < ǫ ✫ ✪ CS, UCSB Tao Yang

✬ ✩ CS140 V-16 Performance evaluation • At each processor, each computation step takes ωr 2 operations. • The communication cost is 4( α + rβ ). • Assume that the global reduction takes ( α + β ) log p . • The number of steps is K . • Assume ω = 0 . 5 , β = 0 . 1 , α = 100 , n = 500 , r = 100 , p 2 = 25. PT = K ( r 2 ω + (4 + log p )( α + rβ )) ωr 2 p 2 Speedup = r 2 ω + (4 + log p )( α + rβ ) ≈ 21 . 2 . Efficiency = 84% . ✫ ✪ CS, UCSB Tao Yang

Use of parallel matrix algorithms for Laplace partial differential - PDF document

CS140 V-1 Use of parallel matrix algorithms for Laplace partial differential equations A steady-state heat-flow problem on a rectangular 10 cm 20 cm metal sheet. One edge maintains temperature of 100 degree, other three edges

JUST THE MATHS SLIDES NUMBER 16.2 LAPLACE TRANSFORMS 2 (Inverse Laplace Transforms) by

Topic 9: The Laplace Transform o Introduction o Laplace Transform & Examples o Region of

TOC Chapter 4. The Laplace Transform [part 1] 4.1 Preliminaries 4.2 Laplace Transform 4.3

+ Design of Parallel Algorithms Parallel Dense Matrix Algorithms + Topic Overview n

JUST THE MATHS SLIDES NUMBER 16.7 LAPLACE TRANSFORMS 7 (An appendix) by A.J.Hobson One

JUST THE MATHS SLIDES NUMBER 16.1 LAPLACE TRANSFORMS 1 (Definitions and rules) by

Laplace Transforms e st f ( t ) dt . Definition 1 (Laplace Transform) . L [ f ( t )] =

Chapter 7: The Laplace Transform Part 1 Department of Electrical Engineering National Taiwan

Signal and Systems Chapter 9: Laplace Transform Motivation and Definition of the (Bilateral)

Overview Partial Constituent Fronting in German The phenomenon: Partial constituent fronting

Laplace Transforms Laplace Transform Motivation Definition v s ( t ) + Region of

Column-Based Matrix Partitioning for Parallel Matrix Multiplication on Heterogeneous Processors

Laplace Transforms Circuit Analysis Example 1: Circuit Analysis We can use the Laplace transform

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

Parallel Linear Algebra Our goals: Fast and efficient parallel algorithms for the matrix-vector

Concepts and Algorithms of Scientific and Visual Computing Partial Differential Equations

Solutions and Eigenvalues of Measure Differential Equations Meirong Zhang Department of

Polynomial and Rational Solutions of Linear Differential or Difference Equations Bruno Salvy

Chapter 2 Chapter 2 Systems Defined by Systems Defined by Differential or Difference

Ordinary Differential Equations Initial Value Problems The question of whether computers can

Nobody ever regretted making a backup AsiaBSDCon 2013 a tutorial, by Dan Langille

spaces ( SB-CON-<YEAR>-<NUM>/SB-SEM-<YEAR>-<NUM> ) Spaces are directly

Semester Conclusion pschiu Computer Center, CS, NCTU This Semester DNS Web Email

Use of parallel matrix algorithms for Laplace partial differential - PDF document

CS140 V-1 Use of parallel matrix algorithms for Laplace partial differential equations A steady-state heat-flow problem on a rectangular 10 cm 20 cm metal sheet. One edge maintains temperature of 100 degree, other three edges

JUST THE MATHS SLIDES NUMBER 16.2 LAPLACE TRANSFORMS 2 (Inverse Laplace Transforms) by

Topic 9: The Laplace Transform o Introduction o Laplace Transform &amp; Examples o Region of

TOC Chapter 4. The Laplace Transform [part 1] 4.1 Preliminaries 4.2 Laplace Transform 4.3

+ Design of Parallel Algorithms Parallel Dense Matrix Algorithms + Topic Overview n

JUST THE MATHS SLIDES NUMBER 16.7 LAPLACE TRANSFORMS 7 (An appendix) by A.J.Hobson One

JUST THE MATHS SLIDES NUMBER 16.1 LAPLACE TRANSFORMS 1 (Definitions and rules) by

Laplace Transforms e st f ( t ) dt . Definition 1 (Laplace Transform) . L [ f ( t )] =

Chapter 7: The Laplace Transform Part 1 Department of Electrical Engineering National Taiwan

Signal and Systems Chapter 9: Laplace Transform Motivation and Definition of the (Bilateral)

Overview Partial Constituent Fronting in German The phenomenon: Partial constituent fronting

Laplace Transforms Laplace Transform Motivation Definition v s ( t ) + Region of

Column-Based Matrix Partitioning for Parallel Matrix Multiplication on Heterogeneous Processors

Laplace Transforms Circuit Analysis Example 1: Circuit Analysis We can use the Laplace transform

[3] The Matrix What is a matrix? Traditional answer Neo: What is the Matrix? Trinity: The answer

Matrix Multiplication Matrix Multiplication via Matrix-Vector Mult Defn. If matrix A is m n

Parallel Linear Algebra Our goals: Fast and efficient parallel algorithms for the matrix-vector

Concepts and Algorithms of Scientific and Visual Computing Partial Differential Equations

Solutions and Eigenvalues of Measure Differential Equations Meirong Zhang Department of

Polynomial and Rational Solutions of Linear Differential or Difference Equations Bruno Salvy

Chapter 2 Chapter 2 Systems Defined by Systems Defined by Differential or Difference

Ordinary Differential Equations Initial Value Problems The question of whether computers can

Nobody ever regretted making a backup AsiaBSDCon 2013 a tutorial, by Dan Langille

spaces ( SB-CON-&lt;YEAR&gt;-&lt;NUM&gt;/SB-SEM-&lt;YEAR&gt;-&lt;NUM&gt; ) Spaces are directly

Semester Conclusion pschiu Computer Center, CS, NCTU This Semester DNS Web Email

Topic 9: The Laplace Transform o Introduction o Laplace Transform & Examples o Region of

spaces ( SB-CON-<YEAR>-<NUM>/SB-SEM-<YEAR>-<NUM> ) Spaces are directly