why parallelize why parallelize
play

Why Parallelize? Why Parallelize? To decrease the overall - PDF document

Simple Steps for Parallelizing a Simple Steps for Parallelizing a FORTRAN Code Using FORTRAN Code Using Message Passing Interface (MPI) Message Passing Interface (MPI) Justin L. Morgan and Jason B. Gilbert Justin L. Morgan and Jason B.


  1. Simple Steps for Parallelizing a Simple Steps for Parallelizing a FORTRAN Code Using FORTRAN Code Using Message Passing Interface (MPI) Message Passing Interface (MPI) Justin L. Morgan and Jason B. Gilbert Justin L. Morgan and Jason B. Gilbert Department of Aerospace Engineering, Auburn University Department of Aerospace Engineering, Auburn University Why Parallelize? Why Parallelize? � To decrease the overall computation time of a job. To decrease the overall computation time of a job. � � To decrease the per To decrease the per- -processor memory usage. processor memory usage. � � As William As William Gropp Gropp states in states in Using MPI Using MPI , , “ “To pull a bigger To pull a bigger � wagon, it is easier to add more oxen than to grow a wagon, it is easier to add more oxen than to grow a gigantic ox.” gigantic ox. ” 1

  2. Physical Problem Formulation Physical Problem Formulation � Determine temperature distribution in a flat plate Determine temperature distribution in a flat plate � with a temperature of 300 K being applied to with a temperature of 300 K being applied to three edges and 500 K being applied to the three edges and 500 K being applied to the fourth edge. fourth edge. Governing Equation Governing Equation & & & & − + = E E E E � Conservation of Energy Conservation of Energy � in out g st (Differential Conservation Form) (Differential Conservation Form) � Assumptions Made Assumptions Made � � Front and Back Faces are Front and Back Faces are � Perfectly Insulated Perfectly Insulated � Steady Conditions Steady Conditions � � No Energy Transformation No Energy Transformation � 2

  3. Discretization Discretization � Point Jacobi Method Point Jacobi Method � + k 1 � Iteratively solve for Iteratively solve for T � i , j ( ) ( ) − + + − + + k k 1 k k k 1 k T 2 T T T 2 T T + − + − + ≅ i 1 , j i , j i 1 , j i , j 1 i , j i , j 1 0 Δ Δ 2 2 x y Implementation in FORTRAN Implementation in FORTRAN � Dimension Arrays Dimension Arrays � � Set Initial and Boundary Conditions Set Initial and Boundary Conditions � � Begin Iterative Process Begin Iterative Process � � Monitor Convergence Monitor Convergence � 3

  4. Results Results � Iterative Convergence Iterative Convergence � 1 ( ) ( ) ⎛ − − ⎞ α + − i max 1 j max 1 k 1 k T T ∑ ∑ ε ⎜ ⎟ ε = i , j i , j i , j ( ) ( ) ⎜ ⎟ = = = i 2 j 2 L α i , j k ⎜ ⎟ T norm N ⎜ ⎟ i , j ⎝ ⎠ Results Results � Temperature Distribution Temperature Distribution � 4

  5. Code Verification Code Verification � Method of Manufactured Solutions (MMS) Method of Manufactured Solutions (MMS) � ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ π π π ~ x y xy = + ⎜ ⎟ + ⎜ ⎟ + ⎜ ⎟ T C C sin C cos C sin ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ 1 2 3 4 ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ a a a 1 2 3 ~ ~ ⎛ ⎞ ⎛ ⎞ ∂ ∂ ⎛ π ⎞ ⎛ π ⎞ ⎛ π ⎞ ⎛ π ⎞ 2 2 T T C x C xy C y C xy ( ) ⎜ ⎟ ⎜ ⎟ + = − π ⎜ ⎟ + π ⎜ ⎟ − π ⎜ ⎟ + π ⎜ ⎟ = 2 2 4 2 2 3 2 4 2 2 sin y sin cos x sin f x , y ⎜ ⎜ ⎟ ⎜ ⎟ ⎟ ⎜ ⎜ ⎟ ⎜ ⎟ ⎟ ∂ ∂ 2 2 2 2 2 2 ⎝ a ⎠ ⎝ a ⎠ ⎝ a ⎠ ⎝ a ⎠ x y ⎝ a a ⎠ ⎝ a a ⎠ 1 1 3 3 2 2 3 3 Code Verification Code Verification � Discretization Error (DE) Discretization Error (DE) � ~ ~ + + = − k 1 k 1 DE T T i , j i , j NUMERICAL EXACT ( ) ( ) ~ ~ ~ ~ + Δ + + Δ k k 2 k k 2 T T x T T y ~ + − + − + ≅ i , j 1 i , j 1 i 1 i 1 , j k 1 ( ) T i , j Δ Δ 2 2 NUMERICAL 2 x y 5

  6. Code Verification Code Verification � Discretization Error (DE) Discretization Error (DE) � Mesh Nodes Maximum DE (K) 10 x 10 13.00 25 x 25 1.30 50 x 50 0.34 Code Verification Code Verification � Global Discretization Error Global Discretization Error � � Formal Order of Accuracy Formal Order of Accuracy � � Observed Order of Accuracy Observed Order of Accuracy � 1.00E+02 1.00E+01 L2Norm rm 2nd Order Slope o N 1.00E+00 2 L 1 10 100 Δ Δ x y = = k k h Δ Δ k x y 1.00E-01 1 1 1.00E-02 h 6

  7. Parallelization Parallelization Domain Decomposition for 2 Processors Domain Decomposition for 2 Processors � � Blue box represents information Blue box represents information to be passed between processors to be passed between processors after each iteration. after each iteration. � Red boxes are fixed boundary Red boxes are fixed boundary � conditions conditions � � Green boxes include the grid points Green boxes include the grid points that are initially sent to each that are initially sent to each processor. processor. Parallel Code Structure Parallel Code Structure � The code is divided into three main sections: the portion perfor The code is divided into three main sections: the portion performed by all med by all � processors, the portion performed by the master processor, and the portion processors, the portion performed by the master processor, and t he portion performed by the slave processors. performed by the slave processors. All Processors All Processors Declare Variables Declare Variables Dimension Arrays Dimension Arrays INCLUDE ‘ ‘MPIF.H MPIF.H’ ’ INCLUDE Initialize MPI Initialize MPI If I am master If I am master then … then … Else (slave processors) Else (slave processors) … … End If End If � � MPIF.H is a file telling the compiler where to find the MPI libraries. MPIF.H is a file telling the compiler where to find the MPI libr aries. 7

  8. Parallel Code Structure Parallel Code Structure � � The job of the master processor is to initialize the grid with i The job of the master processor is to initialize the grid with initial and nitial and boundary conditions, then decompose it and send each processor the boundary conditions, then decompose it and send each processor t he information it needs. information it needs. � Each slave processor receives its initial grid from the master n � Each slave processor receives its initial grid from the master node and ode and begins to perform calculations. After each iteration, individual begins to perform calculations. After each iteration, individual processors must pass the first and last column of their respective grid to ve grid to processors must pass the first and last column of their respecti neighboring processors to update its values. neighboring processors to update its values. � The slave processors iterate until an acceptable convergence has The slave processors iterate until an acceptable convergence has been been � reached and then send the new temperature values back to the master reached and then send the new temperature values back to the mas ter processor to reassemble the grid. processor to reassemble the grid. MPI Functions MPI Functions � MPI Functions Called By All Processors MPI Functions Called By All Processors � � MPI_INIT(IERR) MPI_INIT(IERR) � � MPI_FINALIZE(IERR) MPI_FINALIZE(IERR) � � MPI_COMM_RANK(MPI_COMM_WORLD, MYID, IERR) MPI_COMM_RANK(MPI_COMM_WORLD, MYID, IERR) � � MPI_COMM_SIZE(MPI_COMM_WORLD, NUMPROCS, IERR) MPI_COMM_SIZE(MPI_COMM_WORLD, NUMPROCS, IERR) � � MPI Communication Operations MPI Communication Operations � � MPI_SEND(BUFFER, COUNT, DATATYPE, DESTINATION, TAG, MPI_SEND(BUFFER, COUNT, DATATYPE, DESTINATION, TAG, � MPI_COMM_WORLD, IERR) MPI_COMM_WORLD, IERR) � MPI_RECV(BUFFER,COUNT, DATATYPE, SOURCE, TAG, MPI_RECV(BUFFER,COUNT, DATATYPE, SOURCE, TAG, � MPI_COMM_WORLD, STATUS, IERR) MPI_COMM_WORLD, STATUS, IERR) 8

Recommend


More recommend