relaxed persist ordering using strand persistency
play

Relaxed Persist Ordering Using Strand Persistency Vaibhav Gogte, - PowerPoint PPT Presentation

Relaxed Persist Ordering Using Strand Persistency Vaibhav Gogte, William Wang $ , Stephan Diestelhorst $ , Peter M. Chen, Satish Narayanasamy, Thomas F. Wenisch $ ISCA 2020 Promise of persistent memory (PM) Performance Density Non-volatility


  1. Relaxed Persist Ordering Using Strand Persistency Vaibhav Gogte, William Wang $ , Stephan Diestelhorst $ , Peter M. Chen, Satish Narayanasamy, Thomas F. Wenisch $ ISCA 2020

  2. Promise of persistent memory (PM) Performance Density Non-volatility 2

  3. Promise of persistent memory (PM) Performance * “Optane DC Persistent Memory will be Density offered in packages of up to 512GB per stick.” “… expanding memory per CPU socket to as much as 3TB.” Non-volatility * Source: www.extremetech.com 3

  4. Promise of persistent memory (PM) Performance * “Optane DC Persistent Memory will be Density offered in packages of up to 512GB per stick.” “… expanding memory per CPU socket to as much as 3TB.” Non-volatility * Source: www.extremetech.com Byte-addressable, load-store interface to durable storage 4

  5. Persistent memory system CPU Writeback caches DRAM Persistent Memory (PM) 5

  6. Persistent memory system Failure CPU Writeback caches DRAM Persistent Memory (PM) 6

  7. Persistent memory system Failure CPU Recovery Writeback caches DRAM Persistent Memory (PM) Recovery can inspect PM data-structures to restore system to a consistent state 7

  8. Recovery requires PM access ordering Intel x86 primitives CPU St a = x for recovery Writeback caches St b = y PM 8

  9. Recovery requires PM access ordering Intel x86 primitives Consistency CPU St a = x St a = x model for recovery Writeback caches St b = y St b = y PM 9

  10. Recovery requires PM access ordering Intel x86 primitives Consistency CPU St a = x St a = x model Persistency for recovery Writeback caches model St b = y St b = y PM 10

  11. Recovery requires PM access ordering Intel x86 primitives Consistency CPU St a = x St a = x model CLWB(a) Persistency for recovery Writeback caches model St b = y St b = y PM CLWB(b) 11

  12. Recovery requires PM access ordering Intel x86 primitives Consistency CPU St a = x St a = x model CLWB(a) Persistency for recovery Writeback caches model SFENCE St b = y St b = y PM CLWB(b) 12

  13. Recovery requires PM access ordering Intel x86 primitives Consistency CPU St a = x St a = x model CLWB(a) Persistency for recovery Writeback caches model SFENCE St b = y St b = y PM CLWB(b) Hardware systems provide primitives to express persist order to PM 13

  14. Hardware imposes overly strict constraints St A = 1; CLWB (A) St B = 2; CLWB (B) St C = 3; CLWB (C) Ideal DAG A C B 14

  15. Hardware imposes overly strict constraints St A = 1; CLWB (A) St A = 1; CLWB (A) SFENCE St B = 2; CLWB (B) St B = 2; CLWB (B) St C = 3; CLWB (C) St C = 3; CLWB (C) Ideal DAG DAG 1 A A C C B B 15

  16. Hardware imposes overly strict constraints St A = 1 ; CLWB (A) St A = 1; CLWB (A) St A = 1; CLWB (A) St C = 3; CLWB (C) SFENCE St B = 2; CLWB (B) SFENCE St B = 2; CLWB (B) St C = 3; CLWB (C) St B = 2; CLWB (B) St C = 3; CLWB (C) Ideal DAG DAG 1 DAG 2 A A A C C C B B B 16

  17. Hardware imposes overly strict constraints St A = 1 ; CLWB (A) St A = 1; CLWB (A) St A = 1; CLWB (A) St C = 3; CLWB (C) SFENCE St B = 2; CLWB (B) SFENCE St B = 2; CLWB (B) St C = 3; CLWB (C) St B = 2; CLWB (B) St C = 3; CLWB (C) Ideal DAG DAG 1 DAG 2 A A A C C C B B B Primitives in existing hardware systems overconstrain PM accesses 17

  18. Contributions • Our proposal: StrandWeaver – Builds strand persistency model in hardware – Specifies precise persist ordering constraints • Comprises primitives: PersistBarrier , NewStrand , and JoinStrand – Can encode an arbitrary DAG • Map language-level persistency models to ISA level primitives – Leverage hw primitives to build persistency models efficiently 18

  19. Contributions • Our proposal: StrandWeaver – Builds strand persistency model in hardware – Specifies precise persist ordering constraints • Comprises primitives: PersistBarrier , NewStrand , and JoinStrand – Can encode an arbitrary DAG • Map language-level persistency models to ISA level primitives – Leverage hw primitives to build persistency models efficiently StrandWeaver results in 1.45x (avg.) speedup over Intel x86 19

  20. Outline • Contributions • Example: Failure atomicity • Existing hardware vs. strand persistency model • Our proposal: StrandWeaver • Evaluation 20

  21. Example: Failure atomicity Failure atomicity : Which group of stores persist atomically? atomic_begin() x = 100; Failure-atomic region y = 200; atomic_end() 21

  22. Example: Failure atomicity Failure atomicity : Which group of stores persist atomically? atomic_begin() x = 100; Failure-atomic region y = 200; atomic_end() Failure atomicity limits state that recovery can observe after failure 22

  23. Undo logging for failure atomicity persistUndoLog (L) Init: x = 0; y = 0 atomic_begin() mutateData (M) x = 1; y = 2; persistData (P) atomic_end() commitLog (C) 23

  24. Undo logging for failure atomicity persistUndoLog (L) Init: x = 0; y = 0 atomic_begin() mutateData (M) Failure- x = 1; atomic y = 2; persistData (P) atomic_end() commitLog (C) Undo logging steps ordered to ensure failure atomicity 24

  25. Undo logging for failure atomicity persistUndoLog (L) Init: x = 0; y = 0 atomic_begin() mutateData (M) Failure- x = 1; atomic y = 2; persistData (P) atomic_end() commitLog (C) Undo logging steps ordered to ensure failure atomicity 25

  26. Hardware imposes stricter constraints Ideal ordering SFENCE ordering Log(L x ,x) CLWB(L x ) SFENCE atomic_begin() Log(L x ,x) x = 1; Store(x,1) Log(L y ,y) CLWB(L x ) CLWB(L y ) y = 2; Log(L y ,y) Store(x,1) atomic_end() CLWB(L y ) Store(y,2) SFENCE Store(y,2) 26

  27. Hardware imposes stricter constraints Ideal ordering SFENCE ordering Log(L x ,x) CLWB(L x ) SFENCE atomic_begin() Log(L x ,x) x = 1; Store(x,1) Log(L y ,y) CLWB(L x ) CLWB(L y ) y = 2; Log(L y ,y) Store(x,1) atomic_end() CLWB(L y ) Store(y,2) SFENCE Store(y,2) 27

  28. Hardware imposes stricter constraints Ideal ordering SFENCE ordering Log(L x ,x) CLWB(L x ) SFENCE atomic_begin() Log(L x ,x) x = 1; Store(x,1) Log(L y ,y) CLWB(L x ) CLWB(L y ) y = 2; Log(L y ,y) Store(x,1) atomic_end() CLWB(L y ) Store(y,2) SFENCE Store(y,2) 28

  29. StrandWeaver: Hardware Strand Persistency Model Failure atomicity for language-level persistency models High-level languages Logging impl. that map to hardware primitives Compiler ISA primitives: PersistBarrier, NewStrand, JoinStrand Hardware ISA 29

  30. StrandWeaver: Hardware Strand Persistency Model Failure atomicity for language-level persistency models High-level languages Logging impl. that map to hardware primitives Compiler ISA primitives: PersistBarrier, NewStrand, JoinStrand Hardware ISA 30

  31. StrandWeaver enables persist concurrency • Provides primitives to express precise persist order Strand 0 Strand 1 Persist A A Orders persists within a thread ß PersistBarrier Persist B B 31

  32. StrandWeaver enables persist concurrency • Provides primitives to express precise persist order Strand 0 Strand 1 Persist A A Orders persists within a thread ß PersistBarrier Persist B B Persist C C 32

  33. StrandWeaver enables persist concurrency • Provides primitives to express precise persist order Strand 0 Strand 1 Persist A A strand Orders persists within a thread ß PersistBarrier C Persist B B NewStrand Initiates new stream of persists ß Persist C 33

  34. StrandWeaver enables persist concurrency • Provides primitives to express precise persist order Strand 0 Strand 1 Persist A A strand Orders persists within a thread ß PersistBarrier C Persist B B NewStrand Initiates new stream of persists ß Persist C JoinStrand Merges prior initiated strands ß D Persist D 34

  35. StrandWeaver architecture CPU Load-Store Queue L1 Cache 35

  36. StrandWeaver architecture CPU Persist queue Load-Store • Manages ongoing StrandWeaver primitives Queue Persist Queue • Orders CLWBs separated by JoinStrand L1 Cache 36

  37. StrandWeaver architecture CPU Persist queue Load-Store • Manages ongoing StrandWeaver primitives Queue Persist Queue • Orders CLWBs separated by JoinStrand Strand Buffer Unit SB0 SB1 SBn L1 … • Issues CLWBs and flushes dirty cache lines Cache • Ensures CLWBs on diff. strands are concurrent • Monitors coherence reqs. for inter-thread order Strand Buffer Unit 37

  38. Running example Persist Queue CLWB(A) Example code NewStrand CPU CLWB(B) CLWB(A) JoinStrand NewStrand CLWB(C) CLWB(B) SB0 SB1 JoinStrand L1 CLWB(C) Cache Buffer Idx Strand Buffer Unit 38

  39. Running example Persist Queue CLWB(A) Example code NewStrand CPU CLWB(B) CLWB(A) JoinStrand NewStrand CLWB(C) CLWB(B) SB0 SB1 JoinStrand L1 A CLWB(C) Cache Buffer Idx Strand Buffer Unit 39

  40. Running example Persist Queue CLWB(A) Example code NewStrand CPU CLWB(B) CLWB(A) JoinStrand NewStrand CLWB(C) CLWB(B) SB0 SB1 JoinStrand L1 A CLWB(C) Cache Buffer Idx Strand Buffer Unit 40

  41. Running example Persist Queue CLWB(A) Example code NewStrand CPU CLWB(B) CLWB(A) JoinStrand NewStrand CLWB(C) CLWB(B) SB0 SB1 JoinStrand L1 A B CLWB(C) Cache Buffer Idx Strand Buffer Unit 41

Recommend


More recommend