introduction to high performance computing and
play

Introduction to High Performance Computing and Optimization Oliver - PowerPoint PPT Presentation

Institut fr Numerische Mathematik und Optimierung Introduction to High Performance Computing and Optimization Oliver Ernst Audience: 1./3. CMS, 5./7./9. Mm, doctoral students Wintersemester 2012/13 Contents 1. Introduction 2. Processor


  1. Institut für Numerische Mathematik und Optimierung Introduction to High Performance Computing and Optimization Oliver Ernst Audience: 1./3. CMS, 5./7./9. Mm, doctoral students Wintersemester 2012/13

  2. Contents 1. Introduction 2. Processor Architecture 3. Optimization of Serial Code 3.1 Performance Measurement 3.2 Optimization Guidelines 3.3 Compiler-Aided Optimization 3.4 Combine example 3.5 Further Optimization Issues 4. Parallel Computing 4.1 Introduction 4.2 Scalability 4.3 Parallel Architechtures 4.4 Networks 5. OpenMP Programming Oliver Ernst (INMO) Wintersemester 2012/13 1 HPC

  3. Contents 1. Introduction 2. Processor Architecture 3. Optimization of Serial Code 4. Parallel Computing 4.1 Introduction 4.2 Scalability 4.3 Parallel Architechtures 4.4 Networks 5. OpenMP Programming Oliver Ernst (INMO) Wintersemester 2012/13 138 HPC

  4. Contents 4. Parallel Computing 4.1 Introduction 4.2 Scalability 4.3 Parallel Architechtures 4.4 Networks Oliver Ernst (INMO) Wintersemester 2012/13 139 HPC

  5. Parallel Computing Introduction Many processing units (computers, nodes, processors, cores, threads) collaborate to solve one problem concurrently. Currently: many means up to 1.5 million (current Top500 leader). Objectives: faster execution time for one task (speedup), solution of larger problem (scaleup), memory requirements exceed resources of single computer. Challenges for hardware designers: Power Communication network Memory bandwidth Low level synchronization (e.g. cache coherency) File system Challenges for programmer: Load balancing Synchronization/Communication Algorithm design and redesign Software interface Make maximal use of computer’s resources. Oliver Ernst (INMO) Wintersemester 2012/13 140 HPC

  6. Parallel Computing Types of parallelism: Data parallelism The scale of parallelism refers to the size of concurrently executed tasks. Fine-grain parallelism at scale of functional units of processor (ILP), individual instructions or micro-instructions. Medium-grain parallelism at scale of independent iterations of a loop (e.g. linear algebra operations on vectors, matrices, tensors). Coarse-grain parallelism refers to larger computational tasks with looser synchronization (e.g. domain decomposition methods in PDE/linear system solvers). Data parallel applications are usually implemented using an SPMD (Single Program, Multiple Data) software design, in which the same program runs on all processing units, but not in the tightly synchronized lockstep fashion of SIMD. Oliver Ernst (INMO) Wintersemester 2012/13 141 HPC

  7. Parallel Computing Types of parallelism: Functional parallelism Concurrent execution of different tasks. Programming style known as MPMD (Multiple Program, Multiple Data). More difficult to load balance. Variants: master-slave scheme: one administrative unit to distribute tasks/collect results; remainung units receive tasks and report results to master upon completion. Large-scale functional decomposition: large loosely coupled tasks executed on larger computational units with looser synchronization (e.g. climate models coupling ocean and atmospheric dynamics, fluid-structure interation codes, “multiphysics” codes) Oliver Ernst (INMO) Wintersemester 2012/13 142 HPC

  8. Contents 4. Parallel Computing 4.1 Introduction 4.2 Scalability 4.3 Parallel Architechtures 4.4 Networks Oliver Ernst (INMO) Wintersemester 2012/13 143 HPC

  9. Scalability Basic considerations T : time for 1 worker to complete task , N workers S := T ideal speedup (perfect linear scaling) N Not all computational (or other) tasks scale in this ideal way. “Nine women can’t make a baby in one month.” Fred Brooks. The Mythical Man-Month (1975) Limiting factors: Not all workers receive tasks of equal complexity (or aren’t equally fast); load imbalance Some resources necessary for task completion not available N times; serialization of concurrent execution while waiting for access. Extra work/waiting time due to parallel execution; overhead which is not required for serial task completion. Oliver Ernst (INMO) Wintersemester 2012/13 144 HPC

  10. Scalability Basic considerations T : time for 1 worker to complete task , N workers S := T ideal speedup (perfect linear scaling) N Not all computational (or other) tasks scale in this ideal way. “Nine women can’t make a baby in one month.” Fred Brooks. The Mythical Man-Month (1975) Limiting factors: Not all workers receive tasks of equal complexity (or aren’t equally fast); load imbalance Some resources necessary for task completion not available N times; serialization of concurrent execution while waiting for access. Extra work/waiting time due to parallel execution; overhead which is not required for serial task completion. Oliver Ernst (INMO) Wintersemester 2012/13 144 HPC

  11. Scalability Basic considerations T : time for 1 worker to complete task , N workers S := T ideal speedup (perfect linear scaling) N Not all computational (or other) tasks scale in this ideal way. “Nine women can’t make a baby in one month.” Fred Brooks. The Mythical Man-Month (1975) Limiting factors: Not all workers receive tasks of equal complexity (or aren’t equally fast); load imbalance Some resources necessary for task completion not available N times; serialization of concurrent execution while waiting for access. Extra work/waiting time due to parallel execution; overhead which is not required for serial task completion. Oliver Ernst (INMO) Wintersemester 2012/13 144 HPC

  12. Scalability Performance metrics: Strong scaling T = T s f = s + p serial task completion time, fixed problem size s : serial portion (not parallelizable) of task p : (perfectly) parallelizable portion of task Solution time using N workers: : f = s + p T p N Known as strong scaling since task size fixed. Parallelization used to reduce solution time for fixed problem. Oliver Ernst (INMO) Wintersemester 2012/13 145 HPC

  13. Scalability Performance metrics: Strong scaling T = T s f = s + p serial task completion time, fixed problem size s : serial portion (not parallelizable) of task p : (perfectly) parallelizable portion of task Solution time using N workers: : f = s + p T p N Known as strong scaling since task size fixed. Parallelization used to reduce solution time for fixed problem. Oliver Ernst (INMO) Wintersemester 2012/13 145 HPC

  14. Scalability Performance metrics: Strong scaling T = T s f = s + p serial task completion time, fixed problem size s : serial portion (not parallelizable) of task p : (perfectly) parallelizable portion of task Solution time using N workers: : f = s + p T p N Known as strong scaling since task size fixed. Parallelization used to reduce solution time for fixed problem. Oliver Ernst (INMO) Wintersemester 2012/13 145 HPC

  15. Scalability Performance metrics: Weak scaling Use parallelism to solve larger problem: assume s fixed and parallelizable portion grows with N like N α , α > 0 (often α = 1 ). Then: T = T s v = s + pN α serial task completion time, variable problem size Solution time using N workers: T p v = s + pN α − 1 Known as weak scaling since task size variable. Parallelization used to solve larger problem. Oliver Ernst (INMO) Wintersemester 2012/13 146 HPC

  16. Scalability Performance metrics: Weak scaling Use parallelism to solve larger problem: assume s fixed and parallelizable portion grows with N like N α , α > 0 (often α = 1 ). Then: T = T s v = s + pN α serial task completion time, variable problem size Solution time using N workers: T p v = s + pN α − 1 Known as weak scaling since task size variable. Parallelization used to solve larger problem. Oliver Ernst (INMO) Wintersemester 2012/13 146 HPC

  17. Scalability Performance metrics: Weak scaling Use parallelism to solve larger problem: assume s fixed and parallelizable portion grows with N like N α , α > 0 (often α = 1 ). Then: T = T s v = s + pN α serial task completion time, variable problem size Solution time using N workers: T p v = s + pN α − 1 Known as weak scaling since task size variable. Parallelization used to solve larger problem. Oliver Ernst (INMO) Wintersemester 2012/13 146 HPC

  18. Scalability Application speedup Define performance := work application speedup := parallel performance time , serial performance . Serial performance for fixed problem size s + p : f = s + p = s + p P s s + p = 1 . T s f Parallel performance for fixed problem size, normalize s + p = 1 : f = s + p s + p 1 P p = s + p/N = . T p s + 1 − s f N Application speedup (fixed problem size): P p 1 f S f = = (cf. Amdahl’s Law). P s s + 1 − s f N Oliver Ernst (INMO) Wintersemester 2012/13 147 HPC

  19. Scalability Application speedup: different notion of “work” Count as work only parallelizable portion. Serial performance: = p P sp = p. f T s f Parallel performance: = p 1 − s P pp = . T p f s + 1 − s f N Application speedup: P pp 1 S p f f = = . P sp s + 1 − s f N P pp no longer identical with S p f . f Scalability doesn’t change, but performance does (factor of p smaller). Oliver Ernst (INMO) Wintersemester 2012/13 148 HPC

Recommend


More recommend