Parallel Computing Basics, Semantics Landaus 1st Rule of Education - PowerPoint PPT Presentation

Title Intro Semantics Distributed Mem Performance Amdahl Strategy Practical Messages Conclude Parallel Computing Basics, Semantics Landau’s 1st Rule of Education Rubin H Landau Sally Haerer, Producer-Director Based on A Survey of Computational Physics by Landau, Páez, & Bordeianu with Support from the National Science Foundation Course: Computational Physics II 1 / 15

Title Intro Semantics Distributed Mem Performance Amdahl Strategy Practical Messages Conclude Parallel Problems Basic and Assigned Impressive parallel ( � ) computing hardware advances Beyond � I/O, memory, internal CPU � : multiple processors, single problem Software stuck in 1960s Message passing = dominant, = too elementary Need sophisticated compilers (OK cores) Understanding hybrid programming models Problem: Parallelize simple program’s parameter space Why do? faster, bigger, finer resolutions, different 2 / 15

Title Intro Semantics Distributed Mem Performance Amdahl Strategy Practical Messages Conclude � Computation Example, Matrix Multiplication Need Communication, Synchronization, Math [ B ] = [ A ][ B ] (1) N � B i , j = A i , k B k , j (2) k = 1 Each LHS B i , j � Each LHS row, column [ B ] � RHS B k , j = old, before mult values ⇒ communicate [ B ] = [ A ][ B ] = data dependency, order matters [ C ] = [ A ][ B ] = data parallel 3 / 15

Title Intro Semantics Distributed Mem Performance Amdahl Strategy Practical Messages Conclude Parallel Computer Categories Nodes, Communications, Instructions & Data CPU-CPU, mem-mem networks Internal (2) & external Node = processor location Node: 1-N CPUs Single-instruction, single-data Single-instruction, multiple-data I/O Node Gigabyte Internet Multiple instructs, multiple data Fast Ethernet FPGA MIMD: message-passing JTAG MIMD: no shared mem cluster Compute Nodes MIMD: Difficult program, $ 4 / 15

Title Intro Semantics Distributed Mem Performance Amdahl Strategy Practical Messages Conclude Relation to MultiTasking Locations in Memory (s) Much � on PC, Unix Multitasking ∼ � Indep progs B A B A A simultaneously in RAM C C D D Round robin processing SISD: 1 job/t MIMD: multi jobs/same t 5 / 15

Title Intro Semantics Distributed Mem Performance Amdahl Strategy Practical Messages Conclude Parallel Categories Granularity Coarse-grain: Separate programs & computers B A B A A C e.g. MC on 6 Linux PCs C D D Medium-grain: Several simultaneous processors Grain = measure Bus = communication channel computational work Parallel subroutines ∆ CPUs = computation / Fine-grain: custom compiler communication e.g. � for loops 6 / 15

Title Intro Semantics Distributed Mem Performance Amdahl Strategy Practical Messages Conclude Distributed Memory � via Commodity PCs Clusters, Multicomputers, Beowulf, David Values of Parallel Processing Values of Parallel Processing Mainframe PC Beowulf Mini Work station Vector Computer Dominant coarse-medium grain = Stand-alone PCs, hi-speed switch, messages & network Req: data chunks to indep busy ea processor Send data to nodes, collect, exchange, ... 7 / 15

Title Intro Semantics Distributed Mem Performance Amdahl Strategy Practical Messages Conclude Parallel Performance: Amdahl’s law Simple Accounting of Time Clogged ketchup bottle in cafeteria line Slowest step determines p = infinity 8 reaction rate Speedup 6 p 1 = u � serial, communication = p d Amdahl's Law e e ketchup p S 4 Need ∼ 90% parallel p = 2 Need ∼ 100% for massive 0 0 40% 60% 80% 20% Need new problems Parallel Fraction Percent Parallel 8 / 15

Title Intro Semantics Distributed Mem Performance Amdahl Strategy Practical Messages Conclude Amdahl’s Law Derivation p = no. of CPUs T 1 = 1-CPU time , T p = p -CPU time (1) S p = max parallel speedup = T 1 T p → p (2) Not achieved: some serial, data & memory conflicts Communication, synchronization of the processors f = � fraction of program ⇒ T s = ( 1 − f ) T 1 (serial time) (3) T p = f T 1 (parallel time) (4) p T 1 1 Speedup S p = T s + T p = (Amdahl’s law) (5) 1 − f + f / p 9 / 15

Title Intro Semantics Distributed Mem Performance Amdahl Strategy Practical Messages Conclude Amdah’s Law + Communication Overhead Include Communication Time; Simple & Profound Latency = T c = time to move data T 1 S p ≃ < p (1) T 1 / p + T c For communication time not to matter T 1 ⇒ p ≪ T 1 p ≫ T c (2) T c As ↑ number processors p , T 1 / p → T c Then, more processors ⇒ slower Faster CPU irrelevant 10 / 15

Title Intro Semantics Distributed Mem Performance Amdahl Strategy Practical Messages Conclude How Actually Parallelize Main task program Main routine Serial subroutine a Parallel sub 1 Parallel sub 2 Parallel sub 3 Summation task User creates tasks Avoid storage conflicts Task assigns processor threads ↓ Communication, Main: master, controller synchronization Subtasks: parallel subroutines, Don’t sacrifice science to speed slaves 11 / 15

Title Intro Semantics Distributed Mem Performance Amdahl Strategy Practical Messages Conclude Practical Aspects of Message Passing; Don’t Do It More Processors = More Challenge Only most numerically intensive � Legacy codes often Fortran90 Rewrite (N months) vs Modify serial ( ∼ 70 % )? Steep learning curve, failures, hard debugging Preconditions: run often, for days, little change Need higher resolution, more bodies Problem affects parallelism: data use, problem structure Perfectly (embarrassingly) parallel: (MC) repeats Fully synchronous: Data � (MD), tightly coupled Loosely synchronous: (groundwater diffusion) Pipeline parallel: (data → images → animations) 12 / 15

Title Intro Semantics Distributed Mem Performance Amdahl Strategy Practical Messages Conclude High-Level View of Message Passing 4 Simple Communication Commands Master Compute Simple basics Create Slave 1 compute Create C, Fortran + 4 Slave 2 compute Compute communications send e m send i compute T send: named message compute Receive send send Receive compute receive: any sender compute Receive receive Compute send receive myid: ID processor Receive compute send Send send numnodes compute send 13 / 15

Title Intro Semantics Distributed Mem Performance Amdahl Strategy Practical Messages Conclude � MP: What Can Go Wrong? Hardware Communication = Problematic Master Task cooperation, division Compute Create Slave 1 Correct data division compute Create Slave 2 compute Compute send Many low-level details e m send i compute T Distributed error messages compute Receive send send Receive compute Wrong messages order compute Receive receive Race conditions: order Compute send receive dependent Receive compute send Send send compute Deadlock: wait forever send 14 / 15

Title Intro Semantics Distributed Mem Performance Amdahl Strategy Practical Messages Conclude Conclude: IBM Blue Gene = � by Committee Performance/watt Peak = 360 teraflops (10 12 ) On, off chip mem Medium speed CPU 2 core CPU 5.6 Gflop (cool) 1 Core compute, 1 512 chips/card, 16 communicate cards/Board 65,536 (2 16 ) nodes Control: MPI 15 / 15

Parallel Computing Basics, Semantics Landaus 1st Rule of Education - PowerPoint PPT Presentation

Title Intro Semantics Distributed Mem Performance Amdahl Strategy Practical Messages Conclude Parallel Computing Basics, Semantics Landaus 1st Rule of Education Rubin H Landau Sally Haerer, Producer-Director Based on A Survey of

Semantics 1 / 21 Outline What is semantics? Denotational semantics Semantics of naming What

Parallel Computing: Opportunities and Challenges Victor Lee Parallel Computing Lab (PCL), Intel

Parallel Computing the Why and the How Albert-Jan Yzelman February, 2010 Albert-Jan Yzelman

Operational Semantics 1 / 14 Outline What is semantics? Operational Semantics What is

15-411: Dynamic Semantics Jan Ho ff mann Dynamic Semantics Static semantics: definition of

Adventures in HPC and R: Going Parallel What is Parallel Computing? Justin Harrington &

Outline Overview Theoretical background Parallel computing systems Parallel

Introduction to OpenMP ! Introduction to parallel computing ! Classification of parallel

Java: An Operational Java: An Operational Semantics Semantics Gaurav S. S. Kc Kc Gaurav B.

Polyteam Semantics Team Semantics Axiomatizations in team semantics Polyteams and Jonni

Semantics in Practice Semantics of Practice How do we write semantics? 1: pen-and-paper How do

Introductory Notes Jigsaw Semantics or: Dynamic Semantics Put Together Again Formal semantics

Polyteam Semantics Team Semantics Axiomatisations in team semantics Polyteams and

Introduction to Parallel Computing George Karypis Principles of Parallel Algorithm Design

Parallel Computing Basics Nima Honarmand Fall 2015 :: CSE 610 Parallel Computer Architectures

Overview Parallel computing platforms Approaches to building parallel computers

3. The Normal Distribution and more on the Central Limit Theorem (2.6) 2/10/2020 Recap from

W arehousing The most common form of information in tegration: cop y sources in

RY41 INDUSTRIAL TRAINING BRIEFING SESSION STUDENT PREPARATION FRIDAY 13 SEPTEMBER 2019 10.00 AM

Terena Networking Conference 2003 Applying Radius-based Public Access Roaming in the Finnish

Physics 115A General Physics II Session 4 Fluid flow Bernoullis equation R. J. Wilkes

Asymptotic Analysis of the Lattice Boltzmann Method for Generalized Newtonian Fluid Flows Wen-An

What Is Object-Orientation? Roman Kontchakov Birkbeck, University of London Based on Chapter 4

EV Carbon Offsets Strengthening Business Case Fundamentals for Electric Vehicle Service Equipment

Parallel Computing Basics, Semantics Landaus 1st Rule of Education - PowerPoint PPT Presentation

Title Intro Semantics Distributed Mem Performance Amdahl Strategy Practical Messages Conclude Parallel Computing Basics, Semantics Landaus 1st Rule of Education Rubin H Landau Sally Haerer, Producer-Director Based on A Survey of

Semantics 1 / 21 Outline What is semantics? Denotational semantics Semantics of naming What

Parallel Computing: Opportunities and Challenges Victor Lee Parallel Computing Lab (PCL), Intel

Parallel Computing the Why and the How Albert-Jan Yzelman February, 2010 Albert-Jan Yzelman

Operational Semantics 1 / 14 Outline What is semantics? Operational Semantics What is

15-411: Dynamic Semantics Jan Ho ff mann Dynamic Semantics Static semantics: definition of

Adventures in HPC and R: Going Parallel What is Parallel Computing? Justin Harrington &amp;

Outline Overview Theoretical background Parallel computing systems Parallel

Introduction to OpenMP ! Introduction to parallel computing ! Classification of parallel

Java: An Operational Java: An Operational Semantics Semantics Gaurav S. S. Kc Kc Gaurav B.

Polyteam Semantics Team Semantics Axiomatizations in team semantics Polyteams and Jonni

Semantics in Practice Semantics of Practice How do we write semantics? 1: pen-and-paper How do

Introductory Notes Jigsaw Semantics or: Dynamic Semantics Put Together Again Formal semantics

Polyteam Semantics Team Semantics Axiomatisations in team semantics Polyteams and

Introduction to Parallel Computing George Karypis Principles of Parallel Algorithm Design

Parallel Computing Basics Nima Honarmand Fall 2015 :: CSE 610 Parallel Computer Architectures

Overview Parallel computing platforms Approaches to building parallel computers

3. The Normal Distribution and more on the Central Limit Theorem (2.6) 2/10/2020 Recap from

W arehousing The most common form of information in tegration: cop y sources in

RY41 INDUSTRIAL TRAINING BRIEFING SESSION STUDENT PREPARATION FRIDAY 13 SEPTEMBER 2019 10.00 AM

Terena Networking Conference 2003 Applying Radius-based Public Access Roaming in the Finnish

Physics 115A General Physics II Session 4 Fluid flow Bernoullis equation R. J. Wilkes

Asymptotic Analysis of the Lattice Boltzmann Method for Generalized Newtonian Fluid Flows Wen-An

What Is Object-Orientation? Roman Kontchakov Birkbeck, University of London Based on Chapter 4

EV Carbon Offsets Strengthening Business Case Fundamentals for Electric Vehicle Service Equipment

Adventures in HPC and R: Going Parallel What is Parallel Computing? Justin Harrington &