Why Multiprocessors? Limits on the performance of a single processor: - PDF document

Why Multiprocessors? Limits on the performance of a single processor: what are they? Spring 2009 CSE 471 - Multiprocessors 1 Why Multiprocessors Lots of opportunity • Scientific computing/supercomputing • Examples: weather simulation, aerodynamics, protein folding • Each processor computes for a part of the grid • Server workloads • Example: airline reservation database • Many concurrent updates, searches, lookups, queries • Processors handle different requests • Media workloads • Processors compress/decompress different parts of image/frames • Desktop workloads… • Gaming workloads… What would you do with 500 million transistors? Spring 2009 CSE 471 - Multiprocessors 2 1

Issues in Multiprocessors Which programming model for interprocessor communication • shared memory • regular loads & stores • SPARCCenter, SGI Challenge, Cray T3D, Convex Exemplar, KSR-1&2, today ʼ s CMPs • message passing • explicit sends & receives • TMC CM-5, Intel Paragon, IBM SP-2 Which execution model • control parallel • identify & synchronize different asynchronous threads • data parallel • same operation on different parts of the shared data space Spring 2009 CSE 471 - Multiprocessors 3 Issues in Multiprocessors How to express parallelism • language support • HPF, ZPL • runtime library constructs • coarse-grain, explicitly parallel C programs • automatic (compiler) detection • implicitly parallel C & Fortran programs, e.g., SUIF & PTRANS compilers Application development • embarrassingly parallel programs could be easily parallelized • development of different algorithms for same problem Spring 2009 CSE 471 - Multiprocessors 4 2

Issues in Multiprocessors How to get good parallel performance • recognize parallelism • transform programs to increase parallelism without decreasing processor locality • decrease sharing costs Spring 2009 CSE 471 - Multiprocessors 5 Flynn Classification SISD : single instruction stream, single data stream • single-context uniprocessors SIMD : single instruction stream, multiple data streams • exploits data parallelism • example: Thinking Machines CM MISD : multiple instruction streams, single data stream • systolic arrays • example: Intel iWarp, today ʼ s streaming processors MIMD : multiple instruction streams, multiple data streams • multiprocessors • multithreaded processors • parallel programming & multiprogramming • relies on control parallelism: execute & synchronize different asynchronous threads of control • example: most processor companies have CMP configurations Spring 2009 CSE 471 - Multiprocessors 6 3

CM-1 Spring 2009 CSE 471 - Multiprocessors 7 Systolic Array Spring 2009 CSE 471 - Multiprocessors 8 4

MIMD Low-end • bus-based • simple, but a bottleneck • simple cache coherency protocol • physically centralized memory • uniform memory access (UMA machine) • Sequent Symmetry, SPARCCenter, Alpha-, PowerPC- or SPARC- based servers, most of today ʼ s CMPs Spring 2009 CSE 471 - Multiprocessors 9 Low-end MP Spring 2009 CSE 471 - Multiprocessors 10 5

MIMD High-end • higher bandwidth, multiple-path interconnect • more scalable • more complex cache coherency protocol (if shared memory) • longer latencies • physically distributed memory • non-uniform memory access (NUMA machine) • could have processor clusters • SGI Challenge, Convex Examplar, Cray T3D, IBM SP-2, Intel Paragon, Sun T1 Spring 2009 CSE 471 - Multiprocessors 11 High-end MP Spring 2009 CSE 471 - Multiprocessors 12 6

Comparison of Issue Capabilities Spring 2009 CSE 471 - Multiprocessors 13 Shared Memory vs. Message Passing Shared memory + simple parallel programming model • global shared address space • not worry about data locality but get better performance when program for data placement lower latency when data is local • but can do data placement if it is crucial, but don ʼ t have to • hardware maintains data coherence • synchronize to order processor ʼ s accesses to shared data • like uniprocessor code so parallelizing by programmer or compiler is easier ⇒ can focus on program semantics, not interprocessor communication Spring 2009 CSE 471 - Multiprocessors 14 7

Shared Memory vs. Message Passing Shared memory + low latency (no message passing software) but overlap of communication & computation latency-hiding techniques can be applied to message passing machines + higher bandwidth for small transfers but usually the only choice Spring 2009 CSE 471 - Multiprocessors 15 Shared Memory vs. Message Passing Message passing + abstraction in the programming model encapsulates the communication costs but more complex programming model additional language constructs need to program for nearest neighbor communication + no coherency hardware + good throughput on large transfers but what about small transfers? + more scalable (memory latency for uniform memory doesn ʼ t scale with the number of processors) but large-scale SM has distributed memory also • hah! so you ʼ re going to adopt the message-passing model? Spring 2009 CSE 471 - Multiprocessors 16 8

Shared Memory vs. Message Passing Why there was a debate • little experimental data • not separate implementation from programming model • can emulate one paradigm with the other • MP on SM machine   message buffers in local (to each processor) memory   copy messages by ld/st between buffers • SM on MP machine   ld/st becomes a message copy   sloooooooooow Who won? Spring 2009 CSE 471 - Multiprocessors 17 9

Why Multiprocessors? Limits on the performance of a single processor: - PDF document

Why Multiprocessors? Limits on the performance of a single processor: what are they? Spring 2009 CSE 471 - Multiprocessors 1 Why Multiprocessors Lots of opportunity Scientific computing/supercomputing Examples: weather simulation,

4 Chip Multiprocessors (I) Chip Multiprocessors (ACS MPhil) Robert Mullins Overview

Cap5 - Shared Memory Multiprocessors Logical design and software interactions 1 Shared Memory

5 Chip Multiprocessors (II) Chip Multiprocessors (ACS MPhil) Robert Mullins Overview

Shared Memory Multiprocessors Logical design and software interactions 1 Shared Memory

Architectural Support for Parallel Reduction in Scalable Shared Memory Multiprocessors in

Multiprocessors (Chapter 9) Idea: create powerful computers by connecting many smaller ones

1 Trends when work was done OS Issues for multiprocessors A period when multiprocessors were

Lecture 24: Virtual Memory, Multiprocessors Todays topics: Virtual memory

Multiprocessors/Multicores Presented by Yue Gao September 26, 2013 Presented by Yue Gao

Reducing the Interconnection Network Cost of Chip Multiprocessors Pablo Abad , Valentn Puente

Lecture 23: Virtual Memory, Multiprocessors Todays topics: Virtual memory

Understanding POWER multiprocessors Susmit Sarkar 1 Peter Sewell 1 Jade Alglave 2 , 3 Luc Maranget

Why Im NOT Why Im NOT Why Im NOT Why Im NOT a Hindu Why Im NOT a Hindu

Network Interface Architecture and Prototyping for Chip and Cluster Multiprocessors Supervisor

Database servers on chip multiprocessors: limitations and opportunities N. Hardavellas N.

ORCA LANGUAGE ABSTRACT Microprocessor based shared-memory multiprocessors are becoming widely

HOUNSLOW YOUTH SERVICE London Borough of Hounslow Childrens Services & Lifelong Learning

Impact of Climate Change Water resources Impact of Climate Agriculture and food security

WELCOME WELCOME WELCOME WELCOME to our vibrant & small Conservation Village to our vibrant

Global Payments Uses OPTA2000 Virtual Time Zones to Consolidate on Blades OPTA2000

Presentation Conference Paper June 2012 CITATIONS READS 0 15 1 author: Eloi Keita UBO /

Sequential Presentation Of Long Instructions Limits of pipelining, The case for superscalar,

5 top tips to an awesome 5 minute presentation I regularly attend networking events -

Talking about thinking... TAP-Focus is on Process Product 25% Process 75% Knowledge

Why Multiprocessors? Limits on the performance of a single processor: - PDF document

Why Multiprocessors? Limits on the performance of a single processor: what are they? Spring 2009 CSE 471 - Multiprocessors 1 Why Multiprocessors Lots of opportunity Scientific computing/supercomputing Examples: weather simulation,

4 Chip Multiprocessors (I) Chip Multiprocessors (ACS MPhil) Robert Mullins Overview

Cap5 - Shared Memory Multiprocessors Logical design and software interactions 1 Shared Memory

5 Chip Multiprocessors (II) Chip Multiprocessors (ACS MPhil) Robert Mullins Overview

Shared Memory Multiprocessors Logical design and software interactions 1 Shared Memory

Architectural Support for Parallel Reduction in Scalable Shared Memory Multiprocessors in

Multiprocessors (Chapter 9) Idea: create powerful computers by connecting many smaller ones

1 Trends when work was done OS Issues for multiprocessors A period when multiprocessors were

Lecture 24: Virtual Memory, Multiprocessors Todays topics: Virtual memory

Multiprocessors/Multicores Presented by Yue Gao September 26, 2013 Presented by Yue Gao

Reducing the Interconnection Network Cost of Chip Multiprocessors Pablo Abad , Valentn Puente

Lecture 23: Virtual Memory, Multiprocessors Todays topics: Virtual memory

Understanding POWER multiprocessors Susmit Sarkar 1 Peter Sewell 1 Jade Alglave 2 , 3 Luc Maranget

Why Im NOT Why Im NOT Why Im NOT Why Im NOT a Hindu Why Im NOT a Hindu

Network Interface Architecture and Prototyping for Chip and Cluster Multiprocessors Supervisor

Database servers on chip multiprocessors: limitations and opportunities N. Hardavellas N.

ORCA LANGUAGE ABSTRACT Microprocessor based shared-memory multiprocessors are becoming widely

HOUNSLOW YOUTH SERVICE London Borough of Hounslow Childrens Services &amp; Lifelong Learning

Impact of Climate Change Water resources Impact of Climate Agriculture and food security

WELCOME WELCOME WELCOME WELCOME to our vibrant &amp; small Conservation Village to our vibrant

Global Payments Uses OPTA2000 Virtual Time Zones to Consolidate on Blades OPTA2000

Presentation Conference Paper June 2012 CITATIONS READS 0 15 1 author: Eloi Keita UBO /

Sequential Presentation Of Long Instructions Limits of pipelining, The case for superscalar,

5 top tips to an awesome 5 minute presentation I regularly attend networking events -

Talking about thinking... TAP-Focus is on Process Product 25% Process 75% Knowledge

HOUNSLOW YOUTH SERVICE London Borough of Hounslow Childrens Services & Lifelong Learning

WELCOME WELCOME WELCOME WELCOME to our vibrant & small Conservation Village to our vibrant