Lecture 23: Multiprocessors Todays topics: RAID Multiprocessor - PowerPoint PPT Presentation

Lecture 23: Multiprocessors • Today’s topics: � RAID � Multiprocessor taxonomy � Snooping-based cache coherence protocol 1

RAID 0 and RAID 1 • RAID 0 has no additional redundancy (misnomer) – it uses an array of disks and stripes (interleaves) data across the arrays to improve parallelism and throughput • RAID 1 mirrors or shadows every disk – every write happens to two disks • Reads to the mirror may happen only when the primary disk fails – or, you may try to read both together and the quicker response is accepted • Expensive solution: high reliability at twice the cost 2

RAID 3 • Data is bit-interleaved across several disks and a separate disk maintains parity information for a set of bits • For example: with 8 disks, bit 0 is in disk-0, bit 1 is in disk-1, …, bit 7 is in disk-7; disk-8 maintains parity for all 8 bits • For any read, 8 disks must be accessed (as we usually read more than a byte at a time) and for any write, 9 disks must be accessed as parity has to be re-calculated • High throughput for a single request, low cost for redundancy (overhead: 12.5%), low task-level parallelism 3

RAID 4 and RAID 5 • Data is block interleaved – this allows us to get all our data from a single disk on a read – in case of a disk error, read all 9 disks • Block interleaving reduces thruput for a single request (as only a single disk drive is servicing the request), but improves task-level parallelism as other disk drives are free to service other requests • On a write, we access the disk that stores the data and the parity disk – parity information can be updated simply by checking if the new data differs from the old data 4

RAID 5 • If we have a single disk for parity, multiple writes can not happen in parallel (as all writes must update parity info) • RAID 5 distributes the parity block to allow simultaneous writes 5

RAID Summary • RAID 1-5 can tolerate a single fault – mirroring (RAID 1) has a 100% overhead, while parity (RAID 3, 4, 5) has modest overhead • Can tolerate multiple faults by having multiple check functions – each additional check can cost an additional disk (RAID 6) • RAID 6 and RAID 2 (memory-style ECC) are not commercially employed 6

Multiprocessor Taxonomy • SISD: single instruction and single data stream: uniprocessor • MISD: no commercial multiprocessor: imagine data going through a pipeline of execution engines • SIMD: vector architectures: lower flexibility • MIMD: most multiprocessors today: easy to construct with off-the-shelf computers, most flexibility 7

Memory Organization - I • Centralized shared-memory multiprocessor or Symmetric shared-memory multiprocessor (SMP) • Multiple processors connected to a single centralized memory – since all processors see the same memory organization � uniform memory access (UMA) • Shared-memory because all processors can access the entire memory address space • Can centralized memory emerge as a bandwidth bottleneck? – not if you have large caches and employ fewer than a dozen processors 8

SMPs or Centralized Shared-Memory Processor Processor Processor Processor Caches Caches Caches Caches Main Memory I/O System 9

Memory Organization - II • For higher scalability, memory is distributed among processors � distributed memory multiprocessors • If one processor can directly address the memory local to another processor, the address space is shared � distributed shared-memory (DSM) multiprocessor • If memories are strictly local, we need messages to communicate data � cluster of computers or multicomputers • Non-uniform memory architecture (NUMA) since local memory has lower latency than remote memory 10

Distributed Memory Multiprocessors Processor Processor Processor Processor & Caches & Caches & Caches & Caches Memory I/O Memory I/O Memory I/O Memory I/O Interconnection network 11

SMPs • Centralized main memory and many caches � many copies of the same data • A system is cache coherent if a read returns the most recently written value for that word Time Event Value of X in Cache-A Cache-B Memory 0 - - 1 1 CPU-A reads X 1 - 1 2 CPU-B reads X 1 1 1 3 CPU-A stores 0 in X 0 1 0 12

Cache Coherence A memory system is coherent if: • P writes to X; no other processor writes to X; P reads X and receives the value previously written by P • P1 writes to X; no other processor writes to X; sufficient time elapses; P2 reads X and receives value written by P1 • Two writes to the same location by two processors are seen in the same order by all processors – write serialization • The memory consistency model defines “time elapsed” before the effect of a processor is seen by others 13

Cache Coherence Protocols • Directory-based: A single location (directory) keeps track of the sharing status of a block of memory • Snooping: Every cache block is accompanied by the sharing status of that block – all cache controllers monitor the shared bus so they can update the sharing status of the block, if necessary � Write-invalidate: a processor gains exclusive access of a block before writing by invalidating all other copies � Write-update: when a processor writes, it updates other shared copies of that block 14

Design Issues • Three states for a block: invalid, shared, modified • A write is placed on the bus and sharers invalidate themselves Processor Processor Processor Processor Caches Caches Caches Caches Main Memory I/O System 15

Title • Bullet 16

Lecture 23: Multiprocessors Todays topics: RAID Multiprocessor - PowerPoint PPT Presentation

Lecture 23: Multiprocessors Todays topics: RAID Multiprocessor taxonomy Snooping-based cache coherence protocol 1 RAID 0 and RAID 1 RAID 0 has no additional redundancy (misnomer) it uses an array of disks and stripes

Lecture 23: Virtual Memory, Multiprocessors Todays topics: Virtual memory

Lecture 24: Virtual Memory, Multiprocessors Todays topics: Virtual memory

1 Trends when work was done OS Issues for multiprocessors A period when multiprocessors were

Why Multiprocessors? Limits on the performance of a single processor: what are they? Spring 2009

5 Chip Multiprocessors (II) Chip Multiprocessors (ACS MPhil) Robert Mullins Overview

Shared Memory Multiprocessors Logical design and software interactions 1 Shared Memory

Cap5 - Shared Memory Multiprocessors Logical design and software interactions 1 Shared Memory

4 Chip Multiprocessors (I) Chip Multiprocessors (ACS MPhil) Robert Mullins Overview

Multiprocessors (Chapter 9) Idea: create powerful computers by connecting many smaller ones

Understanding POWER multiprocessors Susmit Sarkar 1 Peter Sewell 1 Jade Alglave 2 , 3 Luc Maranget

Lecture 14: Cilk Shankar Balachandran bshankar@ee.iitb.ac.in The lecture is partly based on

Concurrency On multiprocessors, several threads can execute simultaneously, one on each

Reducing the Interconnection Network Cost of Chip Multiprocessors Pablo Abad , Valentn Puente

Outline Multiprocessors Flynn taxonomy SIMD architectures Vector architectures MIMD

Spin Lock Performance Introduction Shared memory multiprocessors o Various different

Multiprocessors/Multicores Presented by Yue Gao September 26, 2013 Presented by Yue Gao

Issues in Multiprocessors Which programming model for interprocessor communication shared

Scalable Distributed Memory Multiprocessors 1 Outline Scalability physical, bandwidth,

Network Interface Architecture and Prototyping for Chip and Cluster Multiprocessors Supervisor

Analysis and Approximation of Optimal Co-Scheduling on Chip Multiprocessors Yunlian Jiang

7 On-Chip Interconnection Networks Chip Multiprocessors (ACS MPhil) Robert Mullins

Database servers on chip multiprocessors: limitations and opportunities N. Hardavellas N.

ORCA LANGUAGE ABSTRACT Microprocessor based shared-memory multiprocessors are becoming widely

Varying-Speed Multiprocessors Zhishan Guo and Sanjoy Baruah Department of Computer Science, UNC