ECE 451/566 - Intro. to Parallel & Distributed Prog. ECE-451/ECE-566 - Introduction to Parallel and Distributed Programming Lecture 2: Parallel Architectures Lecture 2: Parallel Architectures and Programming Models Department of Electrical & Computer Engineering Rutgers University Machine Architectures and Machine Architectures and Interconnection Networks 1
ECE 451/566 - Intro. to Parallel & Distributed Prog. Architecture Spectrum � Shared-Everything – Symmetric Multiprocessors S t i M lti � Shared Memory – NUMA, CC-NUMA � Distributed Memory – DSM, Message Passing � Shared-Nothing – Clusters, NOW’s � Client/Server Lecture 2 ECE 451/566 - Introduction to Parallel & Distributed Programming 3 Pros and Cons � Shared Memory – Pros P � flexible, easier to program – Cons � not scalable, synchronization/coherency issues � Distributed Memory – Pros os � scalable – Cons � difficult to program, require explicit message passing Lecture 2 ECE 451/566 - Introduction to Parallel & Distributed Programming 4 2
ECE 451/566 - Intro. to Parallel & Distributed Prog. Conventional Computer Consists of a processor executing a program stored in a (main) memory: Main memory Instructions (to processor) Data (to or from processor) Processor Each main memory location located by its address. Addresses start at 0 and extend to 2 b - 1 when there are b bits (binary digits) in address. Lecture 2 ECE 451/566 - Introduction to Parallel & Distributed Programming 5 Shared Memory Multiprocessor System Natural way to extend single processor model - have multiple processors connected to multiple memory modules, such that each processor can access any memory module Memory modules One address space Interconnection network t k Processors Lecture 2 ECE 451/566 - Introduction to Parallel & Distributed Programming 6 3
ECE 451/566 - Intro. to Parallel & Distributed Prog. Simplistic view of a small shared memory multiprocessor Processors Shared memory Bus Examples: � Dual Pentiums � Quad Pentiums Lecture 2 ECE 451/566 - Introduction to Parallel & Distributed Programming 7 Quad Pentium Shared Memory Multiprocessor Processor Processor Processor Processor L1 cache L1 cache L1 cache L1 cache L2 Cache L2 Cache L2 Cache L2 Cache Bus interface Bus interface Bus interface Bus interface Processor/ memory bus I/O interface Memory Controller I/O bus Memory Shared memory Lecture 2 ECE 451/566 - Introduction to Parallel & Distributed Programming 8 4
ECE 451/566 - Intro. to Parallel & Distributed Prog. Programming Shared Memory Multiprocessors Use: � Threads - programmer decomposes program into individual parallel sequences, (threads) each being able to access variables declared outside threads (threads), each being able to access variables declared outside threads. Example Pthreads � Sequential programming language with preprocessor compiler directives to declare shared variables and specify parallelism. Example OpenMP - industry standard - needs OpenMP compiler � Sequential programming language with added syntax to declare shared variables and specify parallelism. Example UPC (Unified Parallel C) needs a UPC compiler. Example UPC (Unified Parallel C) - needs a UPC compiler. � Parallel programming language with syntax to express parallelism - compiler creates executable code for each processor (not now common) � Sequential programming language and ask parallelizing compiler to convert it into parallel executable code. - also not now common Lecture 2 ECE 451/566 - Introduction to Parallel & Distributed Programming 9 Distributed Shared Memory � Making main memory of group of interconnected computers look as though a single memory with single address space. g g y g p � Shared memory programming techniques can then be used. Interconnection network Messages Processor Shared memory Computers Lecture 2 ECE 451/566 - Introduction to Parallel & Distributed Programming 10 5
ECE 451/566 - Intro. to Parallel & Distributed Prog. Message-Passing Multicomputer � Complete computers connected through an interconnection network Interconnection network Messages Processor Local memory Computers Lecture 2 ECE 451/566 - Introduction to Parallel & Distributed Programming 11 Interconnection Networks � Limited and exhaustive interconnections � 2 � 2- and 3-dimensional meshes d 3 di i l h � Hypercube (not now common) � Using Switches – Crossbar – Trees – Multistage interconnection networks Lecture 2 ECE 451/566 - Introduction to Parallel & Distributed Programming 12 6
ECE 451/566 - Intro. to Parallel & Distributed Prog. Two-dimensional array (mesh) Computer/ Links processor Also three-dimensional - used in some large high performance systems. Lecture 2 ECE 451/566 - Introduction to Parallel & Distributed Programming 13 Three-dimensional hypercube 110 111 100 101 010 011 000 001 Lecture 2 ECE 451/566 - Introduction to Parallel & Distributed Programming 14 7
ECE 451/566 - Intro. to Parallel & Distributed Prog. Four-dimensional hypercube 1110 1110 0110 0111 1111 0100 1100 0101 1101 0010 1010 0011 1011 0000 1000 0001 0001 1001 00 Hypercubes popular in 1980’s - not now Lecture 2 ECE 451/566 - Introduction to Parallel & Distributed Programming 15 Crossbar switch Memories Switches Processors Lecture 2 ECE 451/566 - Introduction to Parallel & Distributed Programming 16 8
ECE 451/566 - Intro. to Parallel & Distributed Prog. Tree R Root t Switch Links element Processors Lecture 2 ECE 451/566 - Introduction to Parallel & Distributed Programming 17 Multistage Interconnection Network Example: Omega network 2 × 2 switch elements (straight-through or crossover connections) 000 000 001 001 010 010 011 011 Inputs Outputs 100 100 101 101 110 110 111 111 Lecture 2 ECE 451/566 - Introduction to Parallel & Distributed Programming 18 9
ECE 451/566 - Intro. to Parallel & Distributed Prog. Taxonomy of Taxonomy of HPC Architectures Taxonomy of Architectures � Flynn (1966) created a simple classification for computers based upon number of f t b d b f instruction streams and data streams – SISD - conventional – SIMD - data parallel, vector computing – MISD - systolic arrays – MIMD - very general, multiple approaches. � Current focus on MIMD model, using general purpose processors or multicomputers Lecture 2 ECE 451/566 - Introduction to Parallel & Distributed Programming 20 10
ECE 451/566 - Intro. to Parallel & Distributed Prog. HPC Architecture Examples � SISD - mainframes, workstations, PCs. � SIMD Shared Memory � SIMD Shared Memory - Vector machines, Cray... Vector machines Cray � MIMD Shared Memory - Sequent, KSR, Tera, SGI, SUN. � SIMD Distributed Memory - DAP, TMC CM-2... � MIMD Distributed Memory - Cray T3D, Intel, Transputers, TMC CM-5, plus recent workstation clusters (IBM SP2, DEC, Sun, HP). Note: Modern sequential machines are not purely SISD – advanced Note: Modern sequential machines are not purely SISD advanced RISC processors use many concepts from vector and parallel architectures (pipelining, parallel execution of instructions, prefetching of data, etc) in order to achieve one or more arithmetic operations per clock cycle. Lecture 2 ECE 451/566 - Introduction to Parallel & Distributed Programming 21 SISD : A Conventional Computer Instruc ctions Processor Data Output Data Input � Single processor computer - single stream of instructions Si l t i l t f i t ti generated from program. Instructions operate upon a single stream of data items. � Speed is limited by the rate at which computer can transfer information internally. � e.g. PC, Macintosh, Workstations Lecture 2 ECE 451/566 - Introduction to Parallel & Distributed Programming 22 11
ECE 451/566 - Intro. to Parallel & Distributed Prog. The MISD Architecture Instruction Stream A Instruction Stream B Stream B Instruction Stream C Processor Data A Output Data Stream Processor Input B Stream Processor C � More of an intellectual exercise than a practical configuration. Few built, but commercially not available Lecture 2 ECE 451/566 - Introduction to Parallel & Distributed Programming 23 Single Instruction Stream-Multiple Data Stream (SIMD) Computer � A specially designed computer - a single instruction � A specially designed computer a single instruction stream from a single program, but multiple data streams exist. � Single source program written and each processor executes its personal copy of this program, although independently and not in synchronism. � Developed because a number of important applications th t that mostly operate upon arrays of data. tl t f d t � Source program can be constructed so that parts of the program are executed by certain computers and not others depending upon the identity of the computer. Lecture 2 ECE 451/566 - Introduction to Parallel & Distributed Programming 24 12
Recommend
More recommend