Algorithm Gavin J. Pringle The Benchmark code Particle transport - PowerPoint PPT Presentation

Parallelism Inherent in the Wavefront Algorithm Gavin J. Pringle

The Benchmark code � Particle transport code using wavefront algorithm � Primarily used for benchmarking � Coded in Fortran 90 and MPI � Scales to thousands of cores for large problems � Over 90% of time in one kernel at the heart of the computation 2

Serial Algorithm Outline Outer iteration Loop over energy groups Inner iteration Loop over sweeps Loop over cells in z direction Loop over cells in y direction Loop over cells in x direction Loop over angles (only independent loop!) work (90% of time spent here) End loop over angles End loop over cells in x direction End loop over cells in y direction �

Close up of parallelised loops over cells � Loop over cells in z direction Possible MPI_Recv communications Loop over cells in y direction Loop over cells in x direction Loop over angles (number of angles too small for MPI) work End loop over angles End loop over cells in x direction End loop over cells in y direction Possible MPI_Ssend communcations End loop over cells in z direction �

� MPI 2D decomposition is 2D decomposition of front x-y face. � Figure shows 4 MPI tasks l k j 5

Diagram of dependicies � This diagram shows the domain of one MPI task MPI data FromTop � A cell cannot be processed until all cells �� been processed. MPI data ToLeft MPI data FromRight MPI data ToBottom 6

Sweep order: 3D diagonal slices MPI data FromTop Cells of the same colour MPI are independent and may data ToLeft be processed in parallel MPI data FromRight once preceding slices are complete. MPI data ToBottom 7

Slice shapes (6x6x6) Increasing triangles Then transforming Hexagons Then decreasing (flipped) triangles 8

Slice 1 Cell nearest the viewer 9

Slice 2 Moving down away from viewer 10

Slice 3 11

Slice 4 12

Slice 5 13

Slice 6 14

Slice 7 15

Slice 8 16

Slice 9 17

Slice 10 18

Slice 11 19

Slice 12 20

Slice 13 21

Slice 14 22

Slice 15 23

Slice 16 Point furthest from viewer 24

Close up of parallelised loops over cells using MPI � Loop over cells in z direction Possible MPI_Recv communications Loop over cells in y direction Loop over cells in x direction Loop over angles (number of angles too small for MPI) work End loop over angles End loop over cells in x direction End loop over cells in y direction Possible MPI_Ssend communcations End loop over cells in z direction �

Close up of parallelised loops over cells using MPI and OpenMP � Loop over slices Possible MPI_Recv communications OMP DO PARALLEL Loop over cells in each slice OMP DO PARALLEL Loop over angles work End loop over angles OMP END DO PARALLEL End Loop over cells in each slice OMP END DO PARALLEL Possible MPI_Ssend communcations End loop over slices �

Parallel Algorithm Outline Outer iteration Loop over energy groups Inner iteration Loop over sweeps Loop over slices Possible MPI_Recv communications OMP DO PARALLEL Loop over cells in each slice OMP DO PARALLEL Loop over angles work End loop over angles Etc

Decoupling inter-dependant energy group calculations � Initially, each energy group calculation used a previous energy groups results as input � Decoupling the energy groups has two outcomes � Execution time is greatly increased � Energy Groups are now independent and can be parallelised � Often seen in HPC � Modern algorithms can be inherently serial � An older version may be parallelisable

TaskFarm Summary � If all the tasks take the same time to compute � Block distribution of tasks � Cyclic distribution of tasks � �� else if all tasks have different execution times � If length of tasks are unknown in advance � Cyclic distribution of tasks � else � Order tasks: longest first, shortest last � Cyclic distribution of tasks � endif � Endif

Final Parallel Algorithm Outline Outer iteration MPI Task Farm of energy groups Inner iteration Loop over sweeps Loop over slices Possible MPI_Recv communications OMP DO PARALLEL Loop over cells in each slice OMP DO PARALLEL Loop over angles work End loop over angles Etc

Conclusion � Other wavefront codes have the loops in a different order � Loop over energy groups can occur within loops over cells and might be parallelised with OpenMP � Must be decoupled

Thank you � Any questions? � gavin@epcc.ed.ac.uk

Algorithm Gavin J. Pringle The Benchmark code Particle transport - PowerPoint PPT Presentation

Parallelism Inherent in the Wavefront Algorithm Gavin J. Pringle The Benchmark code Particle transport code using wavefront algorithm Primarily used for benchmarking Coded in Fortran 90 and MPI Scales to thousands of cores for

Odds Algorithm An Online Algorithm Group Fibonado 20. Dec 2016 Group Fibonado Odds Algorithm

Visible Surface Determination CS418 Computer Graphics John C. Hart Painters Algorithm

Algorithm Analysis October 12, 2016 CMPE 250 Algorithm Analysis October 12, 2016 1 / 66

Shortest path using A Algorithm Introduction History Components of A Algorithm

Stoer-Wagner Algorithm A Minimum Cut Algorithm for Undirected Graphs BigNews CS214: Algorithms

Quiz I Give the SVD-based algorithm for solving least squares, and I justify the algorithm by that

Some More Critical Section Solutions Dr. Liam OConnor University of Edinburgh LFCS (and UNSW)

A-Star Algorithm & Heaps/Priority Queues Mark Redekopp 2 A* Search Algorithm ALGORITHM

Earley algorithm Earley: introduction Example of Earley algorithm Scott Farrar CLMA,

The BBS Algorithm The BBS Algorithm The BBS Algorithm Prof. Paolo Ciaccia Prof. Paolo Ciaccia

Avoiding Register Overflow in the Bakery Algorithm The Bakery++ Algorithm The Bakery algorithm is

Dijkstras Algorithm Austin Saporito and Charlie Rizzo Test Questions 1. What is the run time

Pollards Rho Algorithm for Elliptic Curves Aaron Blumenfeld November 30, 2015 Aaron

K-MEANS++ OPTIMAL INITIALIZATION ALGORITHM An Improved K-means Clustering Method OVERVIEW

Knuth-Morris-Pratt Algorithm Kranthi Kumar Mandumula December 18, 2011 Kranthi Kumar Mandumula

Algorithm Analyses Hoang Anh Quan June 22, 2018 Outline The Big Oh, Omega, Theta The first

Dynamic Slicing Techniques for Petri Nets M. Llorens, J. Oliver, J. Silva, S. Tamarit, G. Vidal

Resource Management with systemd LinuxCon North America 2013 Lennart Poettering September 2013

A Distributed Abstraction Algorithm for Online Predicate Detection Himanshu Chauhan 1 Vijay K.

Dawn Song dawnsong@cs.berkeley.edu 1 Bouncer: Securing Software by Blocking Bad Input 2 Main

Wh op ps! W hf re did my archi tf cture go? In betw ef n a m oo olith ao d a sys tf m of sys tf ms

Introduction to Xilinx System Generator Part I Evan Everett and Michael Wu ELEC 433 - Spring

The Application of Dependence Analysis to Software Architecture Descriptions Presenter:

Architecture Support for Disciplined Approximate Programming Hadi Esmaeilzadeh, Adrian Sampson,

Sambuz

Useful Links

Newsletter

Mail Us

Algorithm Gavin J. Pringle The Benchmark code Particle transport - PowerPoint PPT Presentation

Parallelism Inherent in the Wavefront Algorithm Gavin J. Pringle The Benchmark code Particle transport code using wavefront algorithm Primarily used for benchmarking Coded in Fortran 90 and MPI Scales to thousands of cores for

Odds Algorithm An Online Algorithm Group Fibonado 20. Dec 2016 Group Fibonado Odds Algorithm

Visible Surface Determination CS418 Computer Graphics John C. Hart Painters Algorithm

Algorithm Analysis October 12, 2016 CMPE 250 Algorithm Analysis October 12, 2016 1 / 66

Shortest path using A Algorithm Introduction History Components of A Algorithm

Stoer-Wagner Algorithm A Minimum Cut Algorithm for Undirected Graphs BigNews CS214: Algorithms

Quiz I Give the SVD-based algorithm for solving least squares, and I justify the algorithm by that

Some More Critical Section Solutions Dr. Liam OConnor University of Edinburgh LFCS (and UNSW)

A-Star Algorithm &amp; Heaps/Priority Queues Mark Redekopp 2 A* Search Algorithm ALGORITHM

Earley algorithm Earley: introduction Example of Earley algorithm Scott Farrar CLMA,

The BBS Algorithm The BBS Algorithm The BBS Algorithm Prof. Paolo Ciaccia Prof. Paolo Ciaccia

Avoiding Register Overflow in the Bakery Algorithm The Bakery++ Algorithm The Bakery algorithm is

Dijkstras Algorithm Austin Saporito and Charlie Rizzo Test Questions 1. What is the run time

Pollards Rho Algorithm for Elliptic Curves Aaron Blumenfeld November 30, 2015 Aaron

K-MEANS++ OPTIMAL INITIALIZATION ALGORITHM An Improved K-means Clustering Method OVERVIEW

Knuth-Morris-Pratt Algorithm Kranthi Kumar Mandumula December 18, 2011 Kranthi Kumar Mandumula

Algorithm Analyses Hoang Anh Quan June 22, 2018 Outline The Big Oh, Omega, Theta The first

Dynamic Slicing Techniques for Petri Nets M. Llorens, J. Oliver, J. Silva, S. Tamarit, G. Vidal

Resource Management with systemd LinuxCon North America 2013 Lennart Poettering September 2013

A Distributed Abstraction Algorithm for Online Predicate Detection Himanshu Chauhan 1 Vijay K.

Dawn Song dawnsong@cs.berkeley.edu 1 Bouncer: Securing Software by Blocking Bad Input 2 Main

Wh op ps! W hf re did my archi tf cture go? In betw ef n a m oo olith ao d a sys tf m of sys tf ms

Introduction to Xilinx System Generator Part I Evan Everett and Michael Wu ELEC 433 - Spring

The Application of Dependence Analysis to Software Architecture Descriptions Presenter:

Architecture Support for Disciplined Approximate Programming Hadi Esmaeilzadeh, Adrian Sampson,

Sambuz

Useful Links

Newsletter

Mail Us

A-Star Algorithm & Heaps/Priority Queues Mark Redekopp 2 A* Search Algorithm ALGORITHM