distributed shared memory
play

Distributed Shared Memory Distributed Shared Memory Systems - PDF document

Operating Systems DSM Distributed Shared Memory Distributed Shared Memory Systems Page based Shared-variable based Consistency Models Strict consistency Sequential consistency Release consistency


  1. Operating Systems DSM Distributed Shared Memory • Distributed Shared Memory Systems – Page based – Shared-variable based • Consistency Models – Strict consistency – Sequential consistency – Release consistency Distributed Shared Memory • Distributed Shared Memory (DSM): have collection of workstations share a single, virtual address space. • The main objective of DSM: – to alleviate the burden on the programmer – by hiding the fact that physical memory is distributed and not accessible in its entirety to all processors. • DSM creates the illusion of a single shared memory. – much like a virtual memory creates the illusion of a memory that is larger than the available physical memory. • Vanilla implementation: – references to local pages done in hardware. – references to remote page cause HW page fault; trap to OS; load the page from remote; restart faulting instruction. • Optimizations: – share only selected portions of memory. – replicate shared variables on multiple machines.

  2. Operating Systems DSM Page-Based DSM • NUMA (Non-Uniform Memory Access) – processor can directly reference local and remote memory locations – no software intervention • Workstations on network – can only reference local memory • Goal of DSM – add software to allow NOWs (Network of Workstations) to run multiprocessor code – simplicity of programming NUMA Machine NUMA Node NUMA Node

  3. Operating Systems DSM Shared-Variable DSM • Is it necessary to share entire address space? • Share individual variables. • more variety in possible update algorithms for replicated variables • opportunity to eliminate false sharing Design Issues • Replication – replicate read-only portions – replicate read and write portions • Granularity – restriction: memory portions multiples of pages – pros of large portions: • amortize protocol overhead • locality of reference Processor 1 Processor 2 – cons of large portions • false sharing! A A B B code using A code using B

  4. Operating Systems DSM Basic Design • Emulate cache of multiprocessor using the MMU and system software 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 2 5 1 3 6 4 7 11 13 15 9 8 10 12 14 CPU CPU CPU CPU Design Issues (cont) • Update Options: Write-Update vs . Write-Invalidate • Write-Update: – Writes made locally are multicast to all copies of the data item. – Multiple writers can share same data item. – Consistency depends on multicast protocol. • E.g. Sequential consistency achieved with totally ordered multicast. – Reads are cheap • Write-Invalidate – Distinguish between read-only (multiple copies possible) and writable (only one copy possible). – When process attempts to write to remote or replicated item, it first sends a multicast message to invalidate copies; If necessary, get copy of item. – Ensures sequential consistency.

  5. Operating Systems DSM Protocol for Handling DSM Pages (only one writable copy) Operation Page Location Page Status Actions Taken Before Local Read/Write Read Local Read-only Write Local Read-only Invalidate remote copies; Upgrade local copy to writable Read Remote Read-only Make local read-only copy Write Remote Read-only Invalidate remote objects; Make local writable copy Read Local Writable Write Local Writable Read Remote Writable Downgrade page to read-only; Make local read-only copy Write Remote Writable Transfer remote writable copy to local memory Design Issues (cont) • Finding the Owner – broadcast request for owner • combine request with requested operation • problem: broadcast effects all participants (interrupts all processors), uses network bandwidth – page manager • possible hot spot • multiple page manager, hash on page address – probable owner • each process keeps track of probable owner • Update probable owner whenever – Process transfers ownership of a page – Process handles invalidation request for a page – Process receives read access for a page from another process – Process receives request for page it does not own (forwards request to probable owner and resets probable owner to requester) • periodically refresh information about current owners

  6. Operating Systems DSM Probable Owner Chains B C D E B C D E owner A A B C D E B C D E B C D E B C D E owner A A A A owner A issues write A issues read B C D E B C D E owner A A alternatively … Design Issues (cont) • Finding the copies – How to find the copies when they must be invalidated – broadcast requests • what when broadcasts are not reliable? – copysets • maintained by page manager or by owner � � � � � 3 4 2 3 4 1 2 3 2 3 4 1 � � � � � � � � CPU CPU CPU CPU CPU

  7. Operating Systems DSM Consistency Models • Single copy of writable page: –simple, but expensive for heavily shared pages • Multiple copies of writable page –how to keep pages consistent? • Perfect consistency is expensive. • How to relax consistency requirements? • Consistency model Contract between application and memory. If application agrees to obey certain rules, memory promises to work correctly. Consistency Models • Strict consistency: – A DSM is said to obey strict consistency if reading a variable x always returns the value written to x by the most recently executed write operation. • Sequential consistency: – A DSM is said to obey sequential consistency if the sequence of values read by the different processes corresponds to some sequential interleaved execution of the same processes. • Release consistency

  8. Operating Systems DSM Strict Consistency • Most stringent consistency model: Any read to a memory location x returns the value stored by the most recent write operation to x. • strict consistency observed in uni-processor systems. • has come to be expected by uni-processor programmers – very unlikely to be supported by any multiprocessor • All writes are immediately visible by all processes • Requires that absolute global time order is maintained • Example of strict consistency: Initial: x = 0 Real time: t1 t2 t3 t4 P1: x = 1; a1 = x; x = 2; b1 = x; P2: a2 = x; b2 = x; Sequential Consistency • Strict consistency impossible to implement. • Programmers can manage with weaker models. • Sequential consistency [Lamport 79] The result of any execution is the same as if the operations of all processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program. • Any valid interleaving is acceptable, but all processes must see the same sequence of memory references. • Sequential consistency does not guarantee that read returns value written by another process anytime earlier. • Results are not deterministic.

  9. Operating Systems DSM Strict Versus Sequential Consistency Initial: x = 0 Real time: t1 t2 t3 t4 P1: x = 1; a1 = x; x = 2; b1 = x; P2: a2 = x; b2 = x; • Under strict consistency, both processes, P1 and P2, must see 1. x = 1 with their first read operation 2. x = 2 with the second • Under sequential consistency, result is considered correct as long as: 1. The two read operations of P1 always return the values 1 and 2, respectively. 2. The two read operations of P2 return any of the following combinations of values: 0 0, 0 1, 1 1, 1 2, 2 2. Release Consistency • Operations : – acquire critical region: C.S. is about to be entered. • make sure that local copies of variables are made consistent with remote ones. – release critical region: C.S. has just been exited. • propagate shared variables to other machines. – Operations may apply to a subset of shared variables P1 P2 � x x � lock; lock; � delayed x = 1; a = x; until P1 cause unlock; unlock; unlocks propagation

  10. Operating Systems DSM Release Consistency (cont) • Possible implementation – Acquire: 1. Send request for lock to synchronization processor; wait until granted. 2. Issue arbitrary read/writes to/from local copies. – Release: 1. Send modified data to other machines. 2. Wait for acknowledgements. 3. Inform synchronization processor about release. – Operations on different locks happen independently. • Release consistency: 1. Before an ordinary access to a shared variable is performed, all previous acquires done by the process must have completed successfully. 2. Before a release is allowed to be performed, all previous reads and writes done by the process must have completed. Entry Consistency • Operations : – acquire critical region: C.S. is about to be entered. • imports only those shared variables pertaining to the current lock • unnecessary overhead can be eliminated • whereas lazy release consistency imports all shared variables – release critical region: C.S. has just been exited. P1 P2 � x x � cause lock; lock; � delayed propagation x = 1; a = x; until P1 unlock; unlock; unlocks

  11. Operating Systems DSM Consistency Models: Summary Consistency Description Strict Absolute time ordering of all shared accesses matters Sequential All processes see all shared accesses in the same order Release Shared data are made consistent when a critical region is exited

Recommend


More recommend