parallel iterative poisson solver for a distributed
play

Parallel Iterative Poisson Solver for a Distributed Memory - PowerPoint PPT Presentation

1 Parallel Iterative Poisson Solver for a Distributed Memory Architecture Eric Dow Aerospace Computational Design Lab Department of Aeronautics and Astronautics 2 Motivation Solving Poissons Equation is a common sub - problem in many


  1. 1 Parallel Iterative Poisson Solver for a Distributed Memory Architecture Eric Dow Aerospace Computational Design Lab Department of Aeronautics and Astronautics

  2. 2 Motivation • Solving Poisson’s Equation is a common sub - problem in many numerical schemes, notably the solution of the incompressible Navier-Stokes equations. • This step is typically the most expensive of any iterative method, so an efficient Poisson solver is essential.

  3. 3 Problem Description • Poisson’s equation on any arbitrary geometry with homogeneous Dirichlet boundary conditions.       2 u f x     u 0 x  

  4. 4 Iterative Solution Techniques • In 2D, Poisson’s equation can be discretized with Finite Differences:     u u u u u 4      i 1 , j i 1 , j i , j i , j 1 i , j 1 f  i , j 2 x • This suggests the following iterative scheme, known as the Jacobi Iterative Method:   1        n 1 n n n n 2 u u u u u x f i , j     i , j 4 i 1 , j i 1 , j i , j 1 i , j 1 • This is rather slow to converge, and can be made faster by using the updated values of the solution as soon as they are available (Gauss-Seidel Method):   1          n 1 n 1 n n 1 n 2 u u u u u x f i , j     i , j 4 i 1 , j i 1 , j i , j 1 i , j 1

  5. 5 Iterative Solution Techniques • For very large problems (especially in 3D), a direct solve is impractical. • A desired level of accuracy can be attained with an iterative solver: simply stop iterating when a desired level of accuracy is achieved. This is not possible with direct solution techniques such as LU.  Potential to save a great deal of computational effort

  6. 6 Parallelization • Jacobi method seems like a poor choice relative to the Gauss-Seidel method: ▫ Slower to converge ▫ Requires twice as much storage • However, Parallelization of the Jacobi method is straight forward. ▫ Inherent Data Parallelism: The same operations are performed on each grid point, so it makes sense to distribute the data among processes. ▫ All values can be updated contemporaneously. • We need to be more clever with the Gauss- Seidel method…

  7. 7 Red Black Node Ordering • If the sum of the row and column index of a node is even, the node is colored red, otherwise the node is colored black. ▫ Update all of the red nodes in parallel using the values at black nodes. ▫ Update all of the black nodes in parallel using the values at red nodes. •  Restores Data Parallelism

  8. 8 Distributing the Data • Spectral Graph Partitioning: Recursively divide the domain into (roughly) equal pieces.

  9. 9 Distributing the Data • This scheme does not result in an optimal partition, i.e. one that creates partitions of equal size while minimizing the number of edge cuts. • Result: Large variation in size of boundary between subdomains. ▫ This creates a communication bottleneck, and some processes will be waiting on others to finish communication.

  10. 10 Implementation • Serial and parallel solvers implemented in C, MPI used for parallelization. ▫ Each process is given a collection of nodes to update ▫ At the end of each iteration, each thread sends and receives values needed for next iteration ▫ Call to MPI_Barrier required at the end of each communication block to prevent faster processes from racing ahead • Solvers run on Beowulf cluster – 1, 2 and 4 nodes

  11. 11 Results • Serial and parallel codes agree on steady state solution.

  12. 12 Results: Jacobi Method 75 x 75 Nodes 150 x 150 Nodes

  13. 13 Results: Gauss-Seidel Method 75 x 75 Nodes 150 x 150 Nodes

  14. 14 Conclusions • Speedup highly dependent on problem size. ▫ Doubling the number of grid points in each dimension from 75 to 150 quadruples the workload of each process, but only doubles the amount of communication required. This explains the speedup observed. ▫ This is actually good news: typically only use iterative solvers for very large problems. Since the speedup seems to increase with problem size, it makes sense to parallelize these solvers. • Jacobi outperforms Gauss-Seidel in parallel performance due to limited communication.

  15. 15 Future Work • Multigrid: Very efficient iterative solution technique built around basic iterative solvers such as Jacobi and Gauss-Seidel. • Parallel component is already in place, simply integrate parallel solvers in to create a parallel Multigrid method. • Integrate graph partitioning scheme into solver (currently a collection of separate MATLAB functions). • Speedup ceiling: Is there a maximum attainable speedup as the problem size increases (other than the obvious ideal one-to-one speedup)?

  16. 16 References [1] G. Strang, Computational Science and Engineering. Wellesley, MA:Wellesley- Cambridge Press, 2007.

Recommend


More recommend