Technische Universität München Parallel Programming and High-Performance Computing Part 1: Introduction Dr. Ralf-Peter Mundani CeSIM / IGSSE / CiE Technische Universität München
Technische Universität München 1 Introduction General Remarks � Ralf-Peter Mundani � email mundani@tum.de, phone 289–25057, room 3181 (city centre) � consultation-hour: by appointment � Atanas Atanasov � email atanasoa@in.tum.de, phone 289-18615, room 02.05.036 � lecture (2 SWS) � weekly � Tuesday, 14:00—15:30, room 02.07.23 � exercise (1 SWS) � fortnightly � Wednesday, 08:30—10:00, room 02.07.23 materials: http: // www5.in.tum.de / � 1 − 2 Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2010
Technische Universität München 1 Introduction General Remarks � content � part 1: introduction � part 2: high-performance networks � part 3: foundations � part 4: programming memory-coupled systems � part 5: programming message-coupled systems � part 6: dynamic load balancing � part 7: examples of parallel algorithms 1 − 3 Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2010
Technische Universität München 1 Introduction Overview � motivation � hardware excursion � supercomputers � classification of parallel computers � levels of parallelism � quantitative performance evaluation I think there is a world market for maybe five computers. —Thomas Watson, chairman IBM, 1943 1 − 4 Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2010
Technische Universität München 1 Introduction Motivation � numerical simulation: from phenomena to predictions physical phenomenon technical process 1. modelling determination of parameters, expression of relations 2. numerical treatment model discretisation, algorithm development 3. implementation software development, parallelisation discipline 4. visualisation mathematics illustration of abstract simulation results 5. validation computer science comparison of results with reality application 6. embedding insertion into working process 1 − 5 Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2010
Technische Universität München 1 Introduction Motivation � why numerical simulation? � because experiments are sometimes impossible � life cycle of galaxies, weather forecast, terror attacks, e. g. bomb attack on WTC (1993) � because experiments are sometimes not welcome � avalanches, nuclear tests, medicine, e. g. 1 − 6 Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2010
Technische Universität München 1 Introduction Motivation � why numerical simulation? (cont’d) � because experiments are sometimes very costly and-time consuming � protein folding, material sciences, e. g. Mississippi basin model (Jackson, MS) � because experiments are sometimes more expensive � aerodynamics, crash test, e. g. 1 − 7 Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2010
Technische Universität München 1 Introduction Motivation � why parallel programming and HPC? � complex problems (especially the so called “grand challenges”) demand for more computing power � climate or geophysics simulation (tsunami, e. g.) � structure or flow simulation (crash test, e. g.) � development systems (CAD, e. g.) � large data analysis (Large Hadron Collider at CERN, e. g.) � military applications (crypto analysis, e. g.) � … � performance increase due to � faster hardware, more memory ( “ work harder ” ) � more efficient algorithms, optimisation ( “ work smarter ” ) � parallel computing ( “ get some help ” ) 1 − 8 Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2010
Technische Universität München 1 Introduction Motivation � objectives (in case all resources would be available N -times) � throughput : compute N problems simultaneously � running N instances of a sequential program with different data sets ( “ embarrassing parallelism ” ); SETI@home, e. g. � drawback: limited resources of single nodes � response time : compute one problem at a fraction (1 / N ) of time � running one instance (i. e. N processes) of a parallel program for jointly solving a problem; finding prime numbers, e. g. � drawback: writing a parallel program; communication � problem size : compute one problem with N -times larger data � running one instance (i. e. N processes) of a parallel program, using the sum of all local memories for computing larger problem sizes; iterative solution of SLE, e. g. � drawback: writing a parallel program; communication 1 − 9 Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2010
Technische Universität München 1 Introduction Overview � motivation � hardware excursion � supercomputers � classification of parallel computers � levels of parallelism � quantitative performance evaluation 1 − 10 Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2010
Technische Universität München 1 Introduction Hardware Excursion � definition of parallel computers “A collection of processing elements that communicate and cooperate to solve large problems” (A LMASE and G OTTLIEB , 1989) � possible appearances of such processing elements � specialised units (steps of a vector pipeline, e. g.) � parallel features in modern monoprocessors (instruction pipelining, superscalar architectures, VLIW, multithreading, multicore, …) � several uniform arithmetical units (processing elements of array computers, GPUs, e. g.) � complete stand-alone computers connected via LAN (work station or PC clusters, so called virtual parallel computers ) � parallel computers or clusters connected via WAN (so called metacomputers ) 1 − 11 Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2010
Technische Universität München 1 Introduction Hardware Excursion � reminder: arithmetic logical unit (ALU) � schematic layout of the (classical 32-bit) arithmetic logical unit … registers … … 32-bit data bus main memory A B … ALU C C ← A ⊗ B with arithmetic operation ⊗ 1 − 12 Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2010
Technische Universität München 1 Introduction Hardware Excursion � reminder: memory hierarchy single access access speed register cache block access main memory page access capacity background memory serial access archive memory 1 − 13 Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2010
Technische Universität München 1 Introduction Hardware Excursion � instruction pipelining � instruction execution involves several operations 1. instruction fetch (IF) 2. decode (DE) 3. fetch operands (OP) 4. execute (EX) 5. write back (WB) which are executed successively � hence, only one part of CPU works at a given moment … … IF DE OP EX WB IF DE OP EX WB instruction N + 1 instruction N 1 − 14 Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2010
Technische Universität München 1 Introduction Hardware Excursion � instruction pipelining (cont‘d) � observation: while processing particular stage of instruction, other stages are idle � hence, multiple instructions to be overlapped in execution � instruction pipelining (similar to assembly lines) � advantage: no additional hardware necessary … time instruction N IF DE OP EX WB instruction N + 1 IF DE OP EX WB instruction N + 2 IF DE OP EX WB instruction N + 3 IF DE OP EX WB instruction N + 4 IF DE OP EX WB … 1 − 15 Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2010
Technische Universität München 1 Introduction Hardware Excursion � superscalar � CPU (containing several ALUs) might execute several instructions in parallel (with static or dynamic (i. e. out-of-order execution) scheduling) IF DE OP EX WB instruction N instruction N + 1 IF DE OP EX WB time IF DE OP EX WB IF DE OP EX WB IF DE OP EX WB … IF DE OP EX WB IF DE OP EX WB IF DE OP EX WB IF DE OP EX WB instruction N + 9 IF DE OP EX WB 1 − 16 Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2010
Technische Universität München 1 Introduction Hardware Excursion � very long instruction word (VLIW) � in contrast to superscalar architectures, the compiler groups parallel executable instructions during compilation (pipelining still possible) � advantage: no additional hardware logic necessary � drawback: not always fully useable ( � dummy filling (NOP)) VLIW instruction instr. 1 instr. 2 instr. 3 instr. 4 registers 1 − 17 Dr. Ralf-Peter Mundani - Parallel Programming and High-Performance Computing - Summer Term 2010
Recommend
More recommend