Multiprocessors - Flynns Taxonomy (1966) Single Instruction stream, - PowerPoint PPT Presentation

Multiprocessors - Flynn’s Taxonomy (1966) • Single Instruction stream, Single Data stream (SISD) – Conventional uniprocessor – Although ILP is exploited • Single Program Counter - > Single Instruction stream • The data is not “ streaming ” • Single Instruction stream, Multiple Data stream (SIMD) – Popular for some applications like image processing – One can construe vector processors to be of the SIMD type. – MMX extensions to ISA reflect the SIMD philosophy • Also apparent in “ multimedia ” processors (Equator Map- 1000) – “Data Parallel” Programming paradigm Multiprocessors CSE 471 1

Flynn’s Taxonomy (c’ed) • Multiple Instruction stream, Single Data stream (MISD) – Until recently no processor that really fits this category – “Streaming” processors; each processor executes a kernel on a stream of data – Maybe VLIW? • Multiple Instruction stream, Multiple Data stream (MIMD) – The most general – Covers: • Shared-memory multiprocessors • Message passing multicomputers (including networks of workstations cooperating on the same problem; grid computing) Multiprocessors CSE 471 2

Shared-memory Multiprocessors • Shared-Memory = Single shared-address space (extension of uniprocessor; communication via Load/Store) • Uniform Memory Access: UMA – With a shared-bus, it’s the basis for SMP’s (Symmetric MultiProcessing) – Cache coherence enforced by “snoopy” protocols – Number of processors limited by • Electrical constraints on the load of the bus • Contention for the bus – Form the basis for clusters (but in clusters access to memory of other clusters is not UMA) Multiprocessors CSE 471 3

SMP (Symmetric MultiProcessors aka Multis) Single shared-bus Systems Proc. Caches Shared-bus I/O adapter Interleaved Memory Multiprocessors CSE 471 4

Shared-memory Multiprocessors (c’ed) • Non-uniform memory access: NUMA – NUMA-CC: cache coherent (directory-based protocols or SCI) – NUMA without cache coherence (enforced by software) – COMA: Cache Memory Only Architecture – Clusters • Distributed Shared Memory: DSM – Most often network of workstations. – The shared address space is the “virtual address space” – O.S. enforces coherence on a page per page basis Multiprocessors CSE 471 5

UMA – Dance-Hall Architectures & NUMA • Replace the bus by an interconnection network – Cross-bar – Mesh – Perfect shuffle and variants • Better to improve locality with NUMA, Each processing element (PE) consists of: – Processor – Cache hierarchy – Memory for local data (private) and shared data • Cache coherence via directory schemes Multiprocessors CSE 471 6

UMA - Dance-Hall Schematic Processors … Caches Inter - connect … Main memory modules Multiprocessors CSE 471 7

NUMA Processors … caches Local memories Inter - connect … Multiprocessors CSE 471 8

Shared-bus • Number of devices is limited (length, electrical constraints) • The longer the bus, the alrger number of devices but also becomes slower because – Length – Contention • Ultra simplified analysis for contention: – Q = Processor time between L2 misses; T = bus transaction time – Then for 1 process P= bus utilization for 1 processor = T/(T+Q) – For n processors sharing the bus, probability that the bus is busy B(n) = 1 – (1-P) n Multiprocessors CSE 471 9

Cross-bars and Direct Interconnection Networks • Maximum concurrency between n processors and m banks of memory (or cache) • Complexity grows as O(n 2 ) • Logically a set of n multiplexers • But also need of queuing for contending requests • Small cross-bars building blocks for direct interconnection networks – Each node is at the same distance of every other node Multiprocessors CSE 471 10

An O(nlogn) network: Butterfly 0 To go from processor i(xyz in binary) to processor j ( uvw ), start at i and at each 1 stage k follow either the high link if the kth bit of 2 the destination address is 0 or the low link if it is 1. For example the path to go 3 from processor 4 (100) to processor 6 (110) is 4 marked in bold lines. 5 6 7 Multiprocessors CSE 471 11

Indirect Interconnection Networks • Nodes are at various distances of each other • Characterized by their dimension – Various routing mechanisms. For example “higher dimension first” • 2D meshes and tori • 3D cubes • More dimensions: hypercubes • Fat trees Multiprocessors CSE 471 12

Mesh Multiprocessors CSE 471 13

Message-passing Systems • Processors communicate by messages – Primitives are of the form “send”, “receive” – The user (programmer) has to insert the messages – Message passing libraries (MPI, OpenMP etc.) • Communication can be: – Synchronous: The sender must wait for an ack from the receiver (e.g, in RPC) – Asynchronous: The sender does not wait for a reply to continue Multiprocessors CSE 471 14

Shared-memory vs. Message-passing • An old debate that is not that much important any longer • Many systems are built to support a mixture of both paradigms – “send, receive” can be supported by O.S. in shared-memory systems – “load/store” in virtual address space can be used in a message- passing system (the message passing library can use “small” messages to that effect, e.g. passing a pointer to a memory area in another computer) Multiprocessors CSE 471 15

The Pros and Cons • Shared-memory pros – Ease of programming (SPMD: Single Program Multiple Data paradigm) – Good for communication of small items – Less overhead of O.S. – Hardware-based cache coherence • Message-passing pros – Simpler hardware (more scalable) – Explicit communication (both good and bad; some programming languages have primitives for that), easier for long messages – Use of message passing libraries Multiprocessors CSE 471 16

Caveat about Parallel Processing • Multiprocessors are used to: – Speedup computations – Solve larger problems • Speedup – Time to execute on 1 processor / Time to execute on N processors • Speedup is limited by the communication/computation ratio and synchronization • Efficiency – Speedup / Number of processors Multiprocessors CSE 471 17

Amdahl’s Law for Parallel Processing • Recall Amdahl’s law – If x% of your program is sequential, speedup is bounded by 1/x • At best linear speedup (if no sequential section) • What about superlinear speedup? – Theoretically impossible – “Occurs” because adding a processor might mean adding more overall memory and caching (e.g., fewer page faults!) – Have to be careful about the x% of sequentiality. Might become lower if the data set increases. • Speedup and Efficiency should have the number of processors and the size of the input set as parameters Multiprocessors CSE 471 18

Chip MultiProcessors (CMPs) • Multiprocessors vs. multicores – Multiprocessors have private cache hierarchy (on chip) – Multicores have shared L2 (on chop) • How many processors – Typically today 2 to 4 – Tomorrow 8 to 16 – Next decade ??? • Interconnection – Today cross - bar – Tomorrow ??? • Biggest problems – Programming (parallel programming language) – Applications that require parallelism Multiprocessors CSE 471 19

This document was created with Win2PDF available at http://www.win2pdf.com. The unregistered version of Win2PDF is for evaluation or non-commercial use only. This page will not be added after purchasing Win2PDF.

Multiprocessors - Flynns Taxonomy (1966) Single Instruction stream, - PowerPoint PPT Presentation

Multiprocessors - Flynns Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) Conventional uniprocessor Although ILP is exploited Single Program Counter - > Single Instruction stream The data is not

SIMD Programming CS 240A, 2017 1 Flynn* Taxonomy, 1966 In 2013, SIMD and MIMD most common

Outline Multiprocessors Flynn taxonomy SIMD architectures Vector architectures MIMD

Flynns Taxonomy Prof. Mike Flynns famous taxonomy of parallel computers 1 Flynns

Lecture 23: Multiprocessors Todays topics: RAID Multiprocessor taxonomy

BLOOMS TAXONOMY At the end of this workshop you will be able to: Explain what a Taxonomy

1 Trends when work was done OS Issues for multiprocessors A period when multiprocessors were

AmI Taxonomy AmI Taxonomy Network Characteristics of the technologies allowing devices to

Why Multiprocessors? Limits on the performance of a single processor: what are they? Spring 2009

NCTracks Taxonomy Presentation Agenda Taxonomy Code Information Using Taxonomy Codes in

Welcoming the world to Ireland since 1966 About us Founded in 1966 we are Irelands

5 Chip Multiprocessors (II) Chip Multiprocessors (ACS MPhil) Robert Mullins Overview

Taxonomy Jrg Cassens Data and Process Visualization SoSe 2017 SoSe 2017 Jrg Cassens

Shared Memory Multiprocessors Logical design and software interactions 1 Shared Memory

Cap5 - Shared Memory Multiprocessors Logical design and software interactions 1 Shared Memory

Automating MongoDB Clusters Jonathan Rudenberg @"tanous Flynn is an easy PaaS Flynn git

How are living Taxonomy things classified? the classification of living things Taxonomy

4 Chip Multiprocessors (I) Chip Multiprocessors (ACS MPhil) Robert Mullins Overview

SOLO Taxonomy Moving towards understanding What is SOLO Taxonomy? The Structured Overview of

The Greatness of God Examining Gods CV Do you know the word taxonomy? Do you know the

Multiprocessors (Chapter 9) Idea: create powerful computers by connecting many smaller ones

Towards a Taxonomy of Approaches Towards a Taxonomy of Approaches for for Mining of Source Code

Lawrence Flynn, CEO February 2019 TODAYS PRESENTERS Lawrence Flynn Chris Bushnell Andreas

EU Taxonomy Technical Expert Group on Sustainable Finance The taxonomy is a tool, an extremely

Understanding POWER multiprocessors Susmit Sarkar 1 Peter Sewell 1 Jade Alglave 2 , 3 Luc Maranget

Multiprocessors - Flynns Taxonomy (1966) Single Instruction stream, - PowerPoint PPT Presentation

Multiprocessors - Flynns Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) Conventional uniprocessor Although ILP is exploited Single Program Counter - > Single Instruction stream The data is not

SIMD Programming CS 240A, 2017 1 Flynn* Taxonomy, 1966 In 2013, SIMD and MIMD most common

Outline Multiprocessors Flynn taxonomy SIMD architectures Vector architectures MIMD

Flynns Taxonomy Prof. Mike Flynns famous taxonomy of parallel computers 1 Flynns

Lecture 23: Multiprocessors Todays topics: RAID Multiprocessor taxonomy

BLOOMS TAXONOMY At the end of this workshop you will be able to: Explain what a Taxonomy

1 Trends when work was done OS Issues for multiprocessors A period when multiprocessors were

AmI Taxonomy AmI Taxonomy Network Characteristics of the technologies allowing devices to

Why Multiprocessors? Limits on the performance of a single processor: what are they? Spring 2009

NCTracks Taxonomy Presentation Agenda Taxonomy Code Information Using Taxonomy Codes in

Welcoming the world to Ireland since 1966 About us Founded in 1966 we are Irelands

5 Chip Multiprocessors (II) Chip Multiprocessors (ACS MPhil) Robert Mullins Overview

Taxonomy Jrg Cassens Data and Process Visualization SoSe 2017 SoSe 2017 Jrg Cassens

Shared Memory Multiprocessors Logical design and software interactions 1 Shared Memory

Cap5 - Shared Memory Multiprocessors Logical design and software interactions 1 Shared Memory

Automating MongoDB Clusters Jonathan Rudenberg @&quot;tanous Flynn is an easy PaaS Flynn git

How are living Taxonomy things classified? the classification of living things Taxonomy

4 Chip Multiprocessors (I) Chip Multiprocessors (ACS MPhil) Robert Mullins Overview

SOLO Taxonomy Moving towards understanding What is SOLO Taxonomy? The Structured Overview of

The Greatness of God Examining Gods CV Do you know the word taxonomy? Do you know the

Multiprocessors (Chapter 9) Idea: create powerful computers by connecting many smaller ones

Towards a Taxonomy of Approaches Towards a Taxonomy of Approaches for for Mining of Source Code

Lawrence Flynn, CEO February 2019 TODAYS PRESENTERS Lawrence Flynn Chris Bushnell Andreas

EU Taxonomy Technical Expert Group on Sustainable Finance The taxonomy is a tool, an extremely

Understanding POWER multiprocessors Susmit Sarkar 1 Peter Sewell 1 Jade Alglave 2 , 3 Luc Maranget

Automating MongoDB Clusters Jonathan Rudenberg @"tanous Flynn is an easy PaaS Flynn git