truly non blocking writes
play

Truly Non-blocking Writes Luis Useche 2 Ricardo Koller 2 Raju - PowerPoint PPT Presentation

Truly Non-blocking Writes Luis Useche 2 Ricardo Koller 2 Raju Rangaswami 2 Akshat Verma 1 1 IBM Research, India 2 School of Computing and Information Sciences College of Engineering and Computing HotStorage Workshop, 2011 1 / 13 Introduction


  1. Truly Non-blocking Writes Luis Useche 2 Ricardo Koller 2 Raju Rangaswami 2 Akshat Verma 1 1 IBM Research, India 2 School of Computing and Information Sciences College of Engineering and Computing HotStorage Workshop, 2011 1 / 13

  2. Introduction ◮ Memory access granularity is smaller than disk’s 2 / 13

  3. Introduction ◮ Memory access granularity is smaller than disk’s ⇒ Writes to an out-of-core page require a full page fetch. Process OS Backing Store 2 / 13

  4. Introduction ◮ Memory access granularity is smaller than disk’s ⇒ Writes to an out-of-core page require a full page fetch. 1. Write( ✗ ) Process OS ??? Backing Store 2 / 13

  5. Introduction ◮ Memory access granularity is smaller than disk’s ⇒ Writes to an out-of-core page require a full page fetch. 1. Write( ✗ ) 2. Miss Process OS ??? Backing Store 2 / 13

  6. Introduction ◮ Memory access granularity is smaller than disk’s ⇒ Writes to an out-of-core page require a full page fetch. 1. Write( ✗ ) 2. Miss Process OS ??? 3. Issue Backing Store 2 / 13

  7. Introduction ◮ Memory access granularity is smaller than disk’s ⇒ Writes to an out-of-core page require a full page fetch. 110 1. Write( ✗ ) 2. Miss Process 101 OS 001 3. Issue 4. Complete Backing Store 2 / 13

  8. Introduction ◮ Memory access granularity is smaller than disk’s ⇒ Writes to an out-of-core page require a full page fetch. 110 1. Write( ✗ ) 2. Miss Process 101 OS 001 3. Issue 4. Complete Backing Store 5. Return 2 / 13

  9. Introduction ◮ Memory access granularity is smaller than disk’s ⇒ Writes to an out-of-core page require a full page fetch. 6. Write( ✔ ) 110 1. Write( ✗ ) 2. Miss Process 101 OS 001 3. Issue 4. Complete Backing Store 5. Return 2 / 13

  10. Introduction ◮ Memory access granularity is smaller than disk’s ⇒ Writes to an out-of-core page require a full page fetch. 6. Write( ✔ ) 110 1. Write( ✗ ) 2. Miss Process 101 OS 001 3. Issue 4. Complete Backing Store 5. Return For writes: why wait for data that the application doesn’t need? 2 / 13

  11. Introduction ◮ Memory access granularity is smaller than disk’s ⇒ Writes to an out-of-core page require a full page fetch. 6. Write( ✔ ) 110 1. Write( ✗ ) 2. Miss Process 101 OS 001 3. Issue 4. Complete Backing Store 5. Return For writes: why wait for data that the application doesn’t need? 2 / 13

  12. Non-blocking Writes: Basic Approach Process OS Backing Store 3 / 13

  13. Non-blocking Writes: Basic Approach 1. Write( ✗ ) Process ??? OS Backing Store 3 / 13

  14. Non-blocking Writes: Basic Approach 1. Write( ✗ ) 2. Miss Process ??? OS Backing Store 3 / 13

  15. Non-blocking Writes: Basic Approach Patch 3. Buffer 1. Write( ✗ ) 2. Miss Process ??? OS Backing Store 3 / 13

  16. Non-blocking Writes: Basic Approach Patch 3. Buffer 1. Write( ✗ ) 2. Miss Process ??? OS 4. Issue Backing Store 3 / 13

  17. Non-blocking Writes: Basic Approach Patch 3. Buffer 1. Write( ✗ ) 2. Miss Process ??? OS 4. Issue Backing Store 5. Return 3 / 13

  18. Non-blocking Writes: Basic Approach Patch 3. Buffer 110 1. Write( ✗ ) 2. Miss Process 101 OS 001 4. Issue 6. Complete Backing Store 5. Return 3 / 13

  19. Non-blocking Writes: Basic Approach 7. Merge Patch 3. Buffer 110 1. Write( ✗ ) 2. Miss Process 101 OS 001 4. Issue 6. Complete Backing Store 5. Return 3 / 13

  20. Non-blocking Writes: Basic Approach 7. Merge Patch 3. Buffer 110 1. Write( ✗ ) 2. Miss Process 101 OS 001 4. Issue 6. Complete Backing Store 5. Return Benefits 1. Application execution time reduction 2. Increased backing store bandwidth usage 3 / 13

  21. Motivation → Higher Fault Rates Memory over-committed in virtualized en- vironments 4 / 13

  22. Motivation → Higher Fault Rates Memory over-committed in virtualized en- vironments More process running with multi-core and virtualized environments 4 / 13

  23. Motivation → Higher Fault Rates Memory over-committed in virtualized en- vironments More process running with multi-core and virtualized environments Memory hierarchy moving towards a more active and faster backing store 4 / 13

  24. Motivation → % Non-blocking faults ◮ We calculate the % of faults that can benefit in all our workloads 5 / 13

  25. Motivation → % Non-blocking faults ◮ We calculate the % of faults that can benefit in all our workloads: Image Processing Rendering of SVG images Developer Unit and performance testing Server Application, database, and mail server 5 / 13

  26. Motivation → % Non-blocking faults ◮ We calculate the % of faults that can benefit in all our workloads: Image Processing Rendering of SVG images Developer Unit and performance testing Server Application, database, and mail server ◮ Simulator with full-system memory traces. ◮ RAM set to 50% of app footprint 5 / 13

  27. Motivation → % Non-blocking faults ◮ We calculate the % of faults that can benefit in all our workloads: Image Processing Rendering of SVG images Developer Unit and performance testing Server Application, database, and mail server ◮ Simulator with full-system memory traces. ◮ RAM set to 50% of app footprint ◮ Up to 80% of page faults benefit 100 % Non-Block Faults 80 r e Image Proc p Server o l 60 e v e D 40 20 0 Workload 5 / 13

  28. Related Work Alternatives to non-blocking writes: Perfect DRAM Provision Unpredictable or unbounded. Prefetching Can incur false positives and false negatives. Asynchronous System Calls 1. Do not work with memory mapped pages 2. Written data not immediately available for reading 6 / 13

  29. Solution Challenges Process 7 / 13

  30. Solution Challenges write(buf, nbytes, dest addr) OS call Process Store Inst fault(dest addr) 7 / 13

  31. Solution Challenges write(buf, nbytes, dest addr) OS call patch { new buf, nbytes, dest addr } Process Store Inst fault(dest addr) 7 / 13

  32. Solution Challenges write(buf, nbytes, dest addr) OS call patch { new buf, nbytes, dest addr } Process Store Inst fault(dest addr) Information Per Non-blocking Write Information Write Offset Data Written Size of Data 7 / 13

  33. Solution Challenges write(buf, nbytes, dest addr) OS call patch { new buf, nbytes, dest addr } Process Store Inst fault(dest addr) Information Per Non-blocking Write Information Supervised write() ✔ Write Offset ✔ Data Written ✔ Size of Data 7 / 13

  34. Solution Challenges write(buf, nbytes, dest addr) OS call patch { new buf, nbytes, dest addr } Process Store Inst fault(dest addr) Information Per Non-blocking Write Information Supervised Unsupervised write() Fault ✔ ✔ Write Offset ✔ ✗ Data Written ✔ ✗ Size of Data 7 / 13

  35. Solution Challenges write(buf, nbytes, dest addr) OS call patch { new buf, nbytes, dest addr } Process Store Inst fault(dest addr) Information Per Non-blocking Write Information Supervised Unsupervised write() Fault ✔ ✔ Write Offset ✔ ✗ Data Written ✔ ✗ Size of Data 7 / 13

  36. Handling Unsupervised Writes Approach Description Fast All Arch? Low Mem? fault() ✔ ✗ ✔ Full Feature Hard- ware 8 / 13

  37. Handling Unsupervised Writes Approach Description Fast All Arch? Low Mem? fault() ✔ ✗ ✔ Full Feature Hard- ware 4 bytes offset sw $t1, 0xff ✔ ✗ ✔ Opcode Disassembly data 8 / 13

  38. Handling Unsupervised Writes Approach Description Fast All Arch? Low Mem? fault() ✔ ✗ ✔ Full Feature Hard- ware 4 bytes offset sw $t1, 0xff ✔ ✗ ✔ Opcode Disassembly data Disk Page or 0-buffer ✗ ✔ ✗ and 1-buffer Page Diff-Merge Updated Page 8 / 13

  39. Quantifying Benefits 1. Fraction of non-blocking write faults ✔ 2. Outstanding write faults (over time) 3. Savings in execution time (new!) Virtual Memory Simulator Input RAM size & Full System Memory Traces Output Performance statistics ◮ Memory size set to 50% of workloads footprint ◮ Creating patches is not required 9 / 13

  40. Quantifying Benefits → Metric ◮ How to measure the additional parallelism? ◮ Outstanding Write Faults (OWF): # of parallel write faults at any time � OWF ≤ OIO � OWF ≤ 1 for single threaded applications � OWF ≥ 0 when using non-blocking writes ◮ We need the variations over time as well ◮ E[ OWF ] : time-weighted average OWF 50 Image Proc 40 r e p o r e l e v 30 E[OWF] v r e e D S 20 10 0 Workload 10 / 13

  41. Quantifying Benefits → Time Reduction ◮ These results are not in the paper ◮ Execution time = Trace time + Synchronous read time ◮ Write time of dirty page on evictions ignored ◮ Rough estimate: error proportional to the number of dirty pages evicted 60 Image Proc % Exec. Time 40 Decrease r e r e p v o r l e e S v e D 20 0 Workload 11 / 13

  42. Conclusions and Future Work ◮ We presented non-blocking writes: a technique to eliminate read-before-writes � Reduced execution time � Increased device usage ◮ We estimate a reduction times of 0.1-54% ◮ In the future, we are planning to implement non-blocking writes to better study its implications � What workloads benefit from Non-blocking writes? 12 / 13

  43. Questions? 13 / 13

Recommend


More recommend