Relaxed Persist Ordering Using Strand Persistency Vaibhav Gogte, William Wang $ , Stephan Diestelhorst $ , Peter M. Chen, Satish Narayanasamy, Thomas F. Wenisch $ ISCA 2020
Promise of persistent memory (PM) Performance Density Non-volatility 2
Promise of persistent memory (PM) Performance * “Optane DC Persistent Memory will be Density offered in packages of up to 512GB per stick.” “… expanding memory per CPU socket to as much as 3TB.” Non-volatility * Source: www.extremetech.com 3
Promise of persistent memory (PM) Performance * “Optane DC Persistent Memory will be Density offered in packages of up to 512GB per stick.” “… expanding memory per CPU socket to as much as 3TB.” Non-volatility * Source: www.extremetech.com Byte-addressable, load-store interface to durable storage 4
Persistent memory system CPU Writeback caches DRAM Persistent Memory (PM) 5
Persistent memory system Failure CPU Writeback caches DRAM Persistent Memory (PM) 6
Persistent memory system Failure CPU Recovery Writeback caches DRAM Persistent Memory (PM) Recovery can inspect PM data-structures to restore system to a consistent state 7
Recovery requires PM access ordering Intel x86 primitives CPU St a = x for recovery Writeback caches St b = y PM 8
Recovery requires PM access ordering Intel x86 primitives Consistency CPU St a = x St a = x model for recovery Writeback caches St b = y St b = y PM 9
Recovery requires PM access ordering Intel x86 primitives Consistency CPU St a = x St a = x model Persistency for recovery Writeback caches model St b = y St b = y PM 10
Recovery requires PM access ordering Intel x86 primitives Consistency CPU St a = x St a = x model CLWB(a) Persistency for recovery Writeback caches model St b = y St b = y PM CLWB(b) 11
Recovery requires PM access ordering Intel x86 primitives Consistency CPU St a = x St a = x model CLWB(a) Persistency for recovery Writeback caches model SFENCE St b = y St b = y PM CLWB(b) 12
Recovery requires PM access ordering Intel x86 primitives Consistency CPU St a = x St a = x model CLWB(a) Persistency for recovery Writeback caches model SFENCE St b = y St b = y PM CLWB(b) Hardware systems provide primitives to express persist order to PM 13
Hardware imposes overly strict constraints St A = 1; CLWB (A) St B = 2; CLWB (B) St C = 3; CLWB (C) Ideal DAG A C B 14
Hardware imposes overly strict constraints St A = 1; CLWB (A) St A = 1; CLWB (A) SFENCE St B = 2; CLWB (B) St B = 2; CLWB (B) St C = 3; CLWB (C) St C = 3; CLWB (C) Ideal DAG DAG 1 A A C C B B 15
Hardware imposes overly strict constraints St A = 1 ; CLWB (A) St A = 1; CLWB (A) St A = 1; CLWB (A) St C = 3; CLWB (C) SFENCE St B = 2; CLWB (B) SFENCE St B = 2; CLWB (B) St C = 3; CLWB (C) St B = 2; CLWB (B) St C = 3; CLWB (C) Ideal DAG DAG 1 DAG 2 A A A C C C B B B 16
Hardware imposes overly strict constraints St A = 1 ; CLWB (A) St A = 1; CLWB (A) St A = 1; CLWB (A) St C = 3; CLWB (C) SFENCE St B = 2; CLWB (B) SFENCE St B = 2; CLWB (B) St C = 3; CLWB (C) St B = 2; CLWB (B) St C = 3; CLWB (C) Ideal DAG DAG 1 DAG 2 A A A C C C B B B Primitives in existing hardware systems overconstrain PM accesses 17
Contributions • Our proposal: StrandWeaver – Builds strand persistency model in hardware – Specifies precise persist ordering constraints • Comprises primitives: PersistBarrier , NewStrand , and JoinStrand – Can encode an arbitrary DAG • Map language-level persistency models to ISA level primitives – Leverage hw primitives to build persistency models efficiently 18
Contributions • Our proposal: StrandWeaver – Builds strand persistency model in hardware – Specifies precise persist ordering constraints • Comprises primitives: PersistBarrier , NewStrand , and JoinStrand – Can encode an arbitrary DAG • Map language-level persistency models to ISA level primitives – Leverage hw primitives to build persistency models efficiently StrandWeaver results in 1.45x (avg.) speedup over Intel x86 19
Outline • Contributions • Example: Failure atomicity • Existing hardware vs. strand persistency model • Our proposal: StrandWeaver • Evaluation 20
Example: Failure atomicity Failure atomicity : Which group of stores persist atomically? atomic_begin() x = 100; Failure-atomic region y = 200; atomic_end() 21
Example: Failure atomicity Failure atomicity : Which group of stores persist atomically? atomic_begin() x = 100; Failure-atomic region y = 200; atomic_end() Failure atomicity limits state that recovery can observe after failure 22
Undo logging for failure atomicity persistUndoLog (L) Init: x = 0; y = 0 atomic_begin() mutateData (M) x = 1; y = 2; persistData (P) atomic_end() commitLog (C) 23
Undo logging for failure atomicity persistUndoLog (L) Init: x = 0; y = 0 atomic_begin() mutateData (M) Failure- x = 1; atomic y = 2; persistData (P) atomic_end() commitLog (C) Undo logging steps ordered to ensure failure atomicity 24
Undo logging for failure atomicity persistUndoLog (L) Init: x = 0; y = 0 atomic_begin() mutateData (M) Failure- x = 1; atomic y = 2; persistData (P) atomic_end() commitLog (C) Undo logging steps ordered to ensure failure atomicity 25
Hardware imposes stricter constraints Ideal ordering SFENCE ordering Log(L x ,x) CLWB(L x ) SFENCE atomic_begin() Log(L x ,x) x = 1; Store(x,1) Log(L y ,y) CLWB(L x ) CLWB(L y ) y = 2; Log(L y ,y) Store(x,1) atomic_end() CLWB(L y ) Store(y,2) SFENCE Store(y,2) 26
Hardware imposes stricter constraints Ideal ordering SFENCE ordering Log(L x ,x) CLWB(L x ) SFENCE atomic_begin() Log(L x ,x) x = 1; Store(x,1) Log(L y ,y) CLWB(L x ) CLWB(L y ) y = 2; Log(L y ,y) Store(x,1) atomic_end() CLWB(L y ) Store(y,2) SFENCE Store(y,2) 27
Hardware imposes stricter constraints Ideal ordering SFENCE ordering Log(L x ,x) CLWB(L x ) SFENCE atomic_begin() Log(L x ,x) x = 1; Store(x,1) Log(L y ,y) CLWB(L x ) CLWB(L y ) y = 2; Log(L y ,y) Store(x,1) atomic_end() CLWB(L y ) Store(y,2) SFENCE Store(y,2) 28
StrandWeaver: Hardware Strand Persistency Model Failure atomicity for language-level persistency models High-level languages Logging impl. that map to hardware primitives Compiler ISA primitives: PersistBarrier, NewStrand, JoinStrand Hardware ISA 29
StrandWeaver: Hardware Strand Persistency Model Failure atomicity for language-level persistency models High-level languages Logging impl. that map to hardware primitives Compiler ISA primitives: PersistBarrier, NewStrand, JoinStrand Hardware ISA 30
StrandWeaver enables persist concurrency • Provides primitives to express precise persist order Strand 0 Strand 1 Persist A A Orders persists within a thread ß PersistBarrier Persist B B 31
StrandWeaver enables persist concurrency • Provides primitives to express precise persist order Strand 0 Strand 1 Persist A A Orders persists within a thread ß PersistBarrier Persist B B Persist C C 32
StrandWeaver enables persist concurrency • Provides primitives to express precise persist order Strand 0 Strand 1 Persist A A strand Orders persists within a thread ß PersistBarrier C Persist B B NewStrand Initiates new stream of persists ß Persist C 33
StrandWeaver enables persist concurrency • Provides primitives to express precise persist order Strand 0 Strand 1 Persist A A strand Orders persists within a thread ß PersistBarrier C Persist B B NewStrand Initiates new stream of persists ß Persist C JoinStrand Merges prior initiated strands ß D Persist D 34
StrandWeaver architecture CPU Load-Store Queue L1 Cache 35
StrandWeaver architecture CPU Persist queue Load-Store • Manages ongoing StrandWeaver primitives Queue Persist Queue • Orders CLWBs separated by JoinStrand L1 Cache 36
StrandWeaver architecture CPU Persist queue Load-Store • Manages ongoing StrandWeaver primitives Queue Persist Queue • Orders CLWBs separated by JoinStrand Strand Buffer Unit SB0 SB1 SBn L1 … • Issues CLWBs and flushes dirty cache lines Cache • Ensures CLWBs on diff. strands are concurrent • Monitors coherence reqs. for inter-thread order Strand Buffer Unit 37
Running example Persist Queue CLWB(A) Example code NewStrand CPU CLWB(B) CLWB(A) JoinStrand NewStrand CLWB(C) CLWB(B) SB0 SB1 JoinStrand L1 CLWB(C) Cache Buffer Idx Strand Buffer Unit 38
Running example Persist Queue CLWB(A) Example code NewStrand CPU CLWB(B) CLWB(A) JoinStrand NewStrand CLWB(C) CLWB(B) SB0 SB1 JoinStrand L1 A CLWB(C) Cache Buffer Idx Strand Buffer Unit 39
Running example Persist Queue CLWB(A) Example code NewStrand CPU CLWB(B) CLWB(A) JoinStrand NewStrand CLWB(C) CLWB(B) SB0 SB1 JoinStrand L1 A CLWB(C) Cache Buffer Idx Strand Buffer Unit 40
Running example Persist Queue CLWB(A) Example code NewStrand CPU CLWB(B) CLWB(A) JoinStrand NewStrand CLWB(C) CLWB(B) SB0 SB1 JoinStrand L1 A B CLWB(C) Cache Buffer Idx Strand Buffer Unit 41
Recommend
More recommend