Parallel Programming Instructor PanteA Zardoshti Department of Computer Engineering Sharif University of Technology e-mail: azad@sharif.edu
Object Learn how to program numberical methods 2 Computational Mathematics, OpenMP , Sharif University Fall 2015
Single CPU Performance Past: • Doubled every 2 years for 40 years until 9 years ago. 3 Computational Mathematics, OpenMP , Sharif University Fall 2015
Single CPU Performance Past: • Doubled every 2 years for 40 years until 9 years ago. Current Situation: • Marginal improvement in the last 9 years. 3 Computational Mathematics, OpenMP , Sharif University Fall 2015
Single CPU Performance Past: • Doubled every 2 years for 40 years until 9 years ago. Current Situation: • Marginal improvement in the last 9 years. Main Reasons • Memory Wall • Power Wall • Processor Design Complexity 3 Computational Mathematics, OpenMP , Sharif University Fall 2015
Memory Wall Process cessor-Memo Memory ry Perfor formanc ance e Gap Growi wing Source:Intel 4 Computational Mathematics, OpenMP , Sharif University Fall 2015
Power Wall Source: Hennessy and Patterson, Computer Architecture: A Quantitative Approach , 4th edition, 2006 5 Computational Mathematics, OpenMP , Sharif University Fall 2015
What is Serial Computing? Traditionally, software has been written for serial computation: 6 Computational Mathematics, OpenMP , Sharif University Fall 2015
What is Serial Computing? Traditionally, software has been written for serial computation: • To be run on a single computer having a single Central Processing Unit (CPU); CPU 6 Computational Mathematics, OpenMP , Sharif University Fall 2015
What is Serial Computing? Traditionally, software has been written for serial computation: • To be run on a single computer having a single Central Processing Unit (CPU); • A problem is broken into a discrete series of instructions. Problem oblem CPU 6 Computational Mathematics, OpenMP , Sharif University Fall 2015
What is Serial Computing? Traditionally, software has been written for serial computation: • To be run on a single computer having a single Central Processing Unit (CPU); • A problem is broken into a discrete series of instructions. CPU 6 Computational Mathematics, OpenMP , Sharif University Fall 2015
What is Serial Computing? Traditionally, software has been written for serial computation: • To be run on a single computer having a single Central Processing Unit (CPU); • A problem is broken into a discrete series of instructions. • Instructions are executed one after another. CPU t3 t3 t2 t2 t1 t1 6 Computational Mathematics, OpenMP , Sharif University Fall 2015
What is Serial Computing? Traditionally, software has been written for serial computation: • To be run on a single computer having a single Central Processing Unit (CPU); • A problem is broken into a discrete series of instructions. • Instructions are executed one after another. • Only one instruction may execute at any moment in time. CPU t3 t3 t2 t2 6 Computational Mathematics, OpenMP , Sharif University Fall 2015
What is Parallel Computing? parallel computing is the simultaneous use of multiple compute resources to solve a computational problem. 7 Computational Mathematics, OpenMP , Sharif University Fall 2015
What is Parallel Computing? parallel computing is the simultaneous use of multiple compute resources to solve a computational problem. • To be run using multiple CPUs CPU CPU CPU 7 Computational Mathematics, OpenMP , Sharif University Fall 2015
What is Parallel Computing? parallel computing is the simultaneous use of multiple compute resources to solve a computational problem. • To be run using multiple CPUs • A problem is broken into discrete parts that can be solved concurrently CPU Pro bl bl CPU em em CPU 7 Computational Mathematics, OpenMP , Sharif University Fall 2015
What is Parallel Computing? parallel computing is the simultaneous use of multiple compute resources to solve a computational problem. • To be run using multiple CPUs • A problem is broken into discrete parts that can be solved concurrently • Each part is further broken down to a series of instructions CPU t3 t3 t2 t2 t1 t1 CPU t3 t3 t2 t2 t1 t1 CPU t3 t3 t2 t2 t1 t1 7 Computational Mathematics, OpenMP , Sharif University Fall 2015
What is Parallel Computing? parallel computing is the simultaneous use of multiple compute resources to solve a computational problem. • To be run using multiple CPUs • A problem is broken into discrete parts that can be solved concurrently • Each part is further broken down to a series of instructions Instructions from each part execute simultaneously on different CPUs CPU t3 t3 t2 t2 CPU t3 t3 t2 t2 CPU t3 t3 t2 t2 7 Computational Mathematics, OpenMP , Sharif University Fall 2015
Serial vs. Parallel 8 Computational Mathematics, OpenMP , Sharif University Fall 2015
Parallel Computing: Resources The compute resources can include: • A single computer with multiple processors; • A single computer with (multiple) processor(s) and some specialized computer resources (GPU, Xeon phi … ); • An arbitrary number of computers connected by a network; • A combination of both. 9 Computational Mathematics, OpenMP , Sharif University Fall 2015
Why Parallel Computing? The primary reasons for using parallel computing: • Save time • Solve larger problems • Provide concurrency (do multiple things at the same time) 10 Computational Mathematics, OpenMP , Sharif University Fall 2015
Principles of Parallel Computing Finding enough parallelism (Amdahl’s Law) Granularity Locality Load balance Coordination and synchronization 11 Computational Mathematics, OpenMP , Sharif University Fall 2015
Amdahl’s Law Let (1-f) be the fraction of work that must be done sequentially, so f is fraction parallelizable 12 Computational Mathematics, OpenMP , Sharif University Fall 2015
Amdahl’s Law Let (1-f) be the fraction of work that must be done sequentially, so f is fraction parallelizable Let P be the number of processors 12 Computational Mathematics, OpenMP , Sharif University Fall 2015
Amdahl’s Law Let (1-f) be the fraction of work that must be done sequentially, so f is fraction parallelizable Let P be the number of processors Sp Speedup(P)=Time(1)/Tim ime(P) Maximum Sp Speedup < < (1 – F) / 1 + F / N 12 Computational Mathematics, OpenMP , Sharif University Fall 2015
Amdahl’s Law Let (1-f) be the fraction of work that must be done sequentially, so f is fraction parallelizable Let P be the number of processors Speedup(P)=Time(1)/Tim Sp ime(P) Maximum Sp Speedup < < (1 – F) / 1 + F / N Example: • Let f be 80% (0.8) 12 Computational Mathematics, OpenMP , Sharif University Fall 2015
Amdahl’s Law Let (1-f) be the fraction of work that must be done sequentially, so f is fraction parallelizable Let P be the number of processors Speedup(P)=Time(1)/Tim Sp ime(P) Maximum Sp Speedup < < (1 – F) / 1 + F / N Example: • Let f be 80% (0.8) • Speed up cannot be more than 5 even if you have hundreds of processors 12 Computational Mathematics, OpenMP , Sharif University Fall 2015
Granularity Overhead of Parallelism 13 Computational Mathematics, OpenMP , Sharif University Fall 2015
Granularity Overhead of Parallelism • cost of starting a thread or process 13 Computational Mathematics, OpenMP , Sharif University Fall 2015
Granularity Overhead of Parallelism • cost of starting a thread or process • cost of communicating shared data 13 Computational Mathematics, OpenMP , Sharif University Fall 2015
Granularity Overhead of Parallelism • cost of starting a thread or process • cost of communicating shared data • cost of synchronizing 13 Computational Mathematics, OpenMP , Sharif University Fall 2015
Granularity Overhead of Parallelism • cost of starting a thread or process • cost of communicating shared data • cost of synchronizing • extra (redundant) computation 13 Computational Mathematics, OpenMP , Sharif University Fall 2015
Granularity Overhead of Parallelism • cost of starting a thread or process • cost of communicating shared data • cost of synchronizing • extra (redundant) computation Tradeoff 13 Computational Mathematics, OpenMP , Sharif University Fall 2015
Granularity Overhead of Parallelism • cost of starting a thread or process • cost of communicating shared data • cost of synchronizing • extra (redundant) computation Tradeoff • Large units of work reduces overhead of Parallelisms 13 Computational Mathematics, OpenMP , Sharif University Fall 2015
Recommend
More recommend