Cache Coherency Cache coherent processors • most current value for an address is the last write • all reading processors must get the most current value Cache coherency problem • update from a writing processor is not known to other processors Cache coherency protocols • mechanism for maintaining cache coherency • coherency state associated with a block of data • bus/interconnect operations on shared data change the state • for the processor that initiates an operation • for other processors that have the data of the operation resident in their caches Winter 2006 CSE 548 - Cache Coherence 1
A Low-end MP Winter 2006 CSE 548 - Cache Coherence 2
Cache Coherency Protocols Write-invalidate (Sequent Symmetry, SGI Power/Challenge, SPARCCenter 2000) • processor obtains exclusive access for writes (becomes the “ owner ”) by invalidating data in other processors ’ caches • coherency miss (invalidation miss) • cache-to-cache transfers • good for: • multiple writes to same word or block by one processor • migratory sharing from processor to processor Winter 2006 CSE 548 - Cache Coherence 3
A Low-end MP Winter 2006 CSE 548 - Cache Coherence 4
Cache Coherency Protocols Write-update (SPARCCenter 2000) • broadcast each write to actively shared data • each processor with a copy snoops/takes the data • good for inter-processor contention Competitive (Alphas) • switches between them We will focus on write-invalidate. Winter 2006 CSE 548 - Cache Coherence 5
A Low-end MP Winter 2006 CSE 548 - Cache Coherence 6
Cache Coherency Protocol Implementations Snooping • used with low-end MPs • few processors • centralized memory • bus-based • distributed implementation: responsibility for maintaining coherence lies with each cache Directory-based • used with higher-end MPs • more processors • distributed memory • multi-path interconnect • centralized for each address: responsibility for maintaining coherence lies with the directory for each address Winter 2006 CSE 548 - Cache Coherence 7
Snooping Implementation A distributed coherency protocol • coherency state associated with each cache block • each snoop maintains coherency for its own cache Winter 2006 CSE 548 - Cache Coherence 8
Snooping Implementation How the bus is used • broadcast medium • entire coherency operation is atomic wrt other processors • keep-the-bus protocol : master holds the bus until the entire operation has completed • split-transaction buses : • request & response are different phases • state value that indicates that an operation is in progress • do not initiate another operation for a cache block that has one in progress Winter 2006 CSE 548 - Cache Coherence 9
Snooping Implementation Snoop implementation: • snoop on the highest level cache • another reason L2 is physically-accessed • property of inclusion : • all blocks in L1 are in L2 • therefore only have to snoop on L2 • may need to update L1 state if change L2 state • separate tags & state for snoop lookups • processor & snoop communicate for a state or tag change Winter 2006 CSE 548 - Cache Coherence 10
An Example Snooping Protocol Invalidation-based coherency protocol Each cache block is in one of three states • shared : • clean in all caches & up-to-date in memory • block can be read by any processor • exclusive : • dirty in exactly one cache • only that processor can write to it • invalid : • block contains no valid data Winter 2006 CSE 548 - Cache Coherence 11
State Transitions for a Given Cache Block State transitions caused by: • events caused by the requesting processor , e.g., • read miss, write miss, write on shared block • events caused by snoops of other caches , e.g., • read miss by P1 makes P2 ’ s owned block change from exclusive to shared • write miss by P1 makes P2 ’ s owned block change from exclusive to invalid Winter 2006 CSE 548 - Cache Coherence 12
State Machine (CPU side) CPU read hit Shared CPU read miss Invalid (read/only) CPU read miss Place read op Place read op on bus on bus CPU read miss CPU write miss Place read op on bus Place write op Write-back block on bus CPU write Place write op on bus CPU read hit Exclusive (read/write) CPU write miss Place write op on bus CPU write hit Write-back cache block Winter 2006 CSE 548 - Cache Coherence 13
State Machine (Bus side: the snoop) Write miss Shared for this block Invalid (read/only) Write miss for this block Read miss for this block Write-back the block Write-back the block Exclusive (read/write) Winter 2006 CSE 548 - Cache Coherence 14
Directory Implementation Distributed memory • each processor (or cluster of processors) has its own memory • processor-memory pairs are connected via a multi-path interconnection network • snooping with broadcasting is wasteful • point-to-point communication instead • a processor has fast access to its local memory & slower access to “remote” memory located at other processors • NUMA (non-uniform memory access) machines Winter 2006 CSE 548 - Cache Coherence 15
A High-end MP Proc $ Proc $ Proc $ Mem Mem Mem Dir Dir Dir Interconnection network Mem Mem Mem Dir Dir Dir Proc $ Proc $ Proc $ Winter 2006 CSE 548 - Cache Coherence 16
Directory Implementation How cache coherency is handled • no caches (Cray MTA) • disallow caching of shared data (Cray 3TD) • software coherence • hardware directories that record cache block state Winter 2006 CSE 548 - Cache Coherence 17
Directory Implementation Coherency state is associated with memory blocks that are the size of cache blocks • cache state • shared : • at least 1 processor has the data cached & memory is up- to-date • block can be read by any processor • exclusive : • 1 processor (the owner) has the data cached & memory is stale • only that processor can write to it • invalid : • no processor has the data cached & memory is up-to-date • directory state • bit vector in which 1 means the processor has cached the data • write bit to indicate if exclusive Winter 2006 CSE 548 - Cache Coherence 18
Directory Implementation Directories have different uses to different processors • home node: where the memory location of an address resides (and cached data may be there too) • local node: where the memory request initiated • remote node: an alternate location for the data if this processor has requested & cached it In satisfying a memory request: • messages sent between the different nodes in point-to-point communication • messages get explicit replies Some simplifying assumptions for using the protocol • processor blocks until the access is complete • messages processed in the order received Winter 2006 CSE 548 - Cache Coherence 19
Read Miss for an Uncached Block P2 $ P3 $ Mem Mem Mem 2: data value reply Dir Interconnection network Mem Mem 1: read miss Dir P1 $ P4 $ Winter 2006 CSE 548 - Cache Coherence 20
Read Miss for an Exclusive, Remote Block P2 $ P3 $ 2: fetch Mem Mem 3: data write-back 4: data value reply Dir Dir Interconnection network Mem Mem 1: read miss Dir P1 $ P4 $ Winter 2006 CSE 548 - Cache Coherence 21
Write Miss for an Exclusive, Remote Block P2 $ P3 $ Mem Mem 2: fetch & invalidate 3: data write-back 4: data value reply Dir Dir Interconnection network Mem Mem Mem 1: write miss Dir P1 $ P4 $ Winter 2006 CSE 548 - Cache Coherence 22
Directory Protocol Messages Message type Source Destination Msg Content Read miss Local cache Home directory P, A – Processor P reads data at address A; make P a read sharer and arrange to send data back Write miss Local cache Home directory P, A – Processor P writes data at address A; make P the exclusive owner and arrange to send data back Invalidate Home directory Remote caches A – Invalidate a shared copy at address A. Fetch Home directory Remote cache A – Fetch the block at address A and send it to its home directory Fetch/Invalidate Home directory Remote cache A – Fetch the block at address A and send it to its home directory; invalidate the block in the cache Data value reply Home directory Local cache Data – Return a data value from the home memory (read or write miss response) Data write-back Remote cache Home directory A, Data – Write-back a data value for address A (invalidate response) Winter 2006 CSE 548 - Cache Coherence 23
CPU FSM for a Cache Block States identical to the snooping protocol Transactions very similar • read & write misses sent to home directory • invalidate & data fetch requests to the node with the data replace broadcasted read/write misses Winter 2006 CSE 548 - Cache Coherence 24
CPU FSM for a Cache Block CPU read hit Invalidate Shared Invalid (read/only) CPU read Send read miss CPU read miss CPU write CPU write Fetch/Invalidate Send write miss Send invalidate (write miss) Send data write-back Fetch Send data write-back CPU read hit Read miss Send read miss Exclusive Send data write-back (read/write) CPU write miss Send write miss message CPU write hit Data write-back message Winter 2006 CSE 548 - Cache Coherence 25
Recommend
More recommend