getting m g mor ore p e per erformance w ce with pol olym
play

Getting M g Mor ore P e Per erformance w ce with Pol olym - PowerPoint PPT Presentation

Getting M g Mor ore P e Per erformance w ce with Pol olym ymorphism sm f from om E Emer erging g Mem emory T y Tec echnol ologies es Iyswarya Narayanan, Aishwarya Ganesan, Anirudh Badam, Sriram Govindan, Bikash Sharma, Anand


  1. Getting M g Mor ore P e Per erformance w ce with Pol olym ymorphism sm f from om E Emer erging g Mem emory T y Tec echnol ologies es Iyswarya Narayanan, Aishwarya Ganesan, Anirudh Badam, Sriram Govindan, Bikash Sharma, Anand Sivasubramaniam

  2. Resource needs of cloud storage applications span multiple aspects Volatile and High Capacity persistent accesses SSD SSD High performance PDF Trimmed tail 2 Latency

  3. Cloud applications are diverse! In terms of their capacity needs for both volatile reads and persistent writes Reads Writes 0.8 Unique pages accessed within a time window 0.6 0.4 0.2 0 Cloud Storage Map Reduce Search-Index Search-Serve Read and Write intensive Read intensive write intensive 3

  4. Cloud applications are diverse! In terms of volume of read and write accesses Search-Serve Across different applications Temporally within same applications How to effectively provision memory and storage resources for diverse cloud storage applications? 4

  5. DRAM and SSD are memory and storage resources Volatile Persistent Low latency High latency Low capacity High capacity Latency Cost Capacity 5

  6. They are rigid in their performance characteristics In Function: Memory Volatile Persistent or Storage Low latency High latency Low capacity High capacity In Latency: fast DRAM vs. slow SSD In Capacity: Based on Latency server SKU Cost Capacity 6

  7. Can emerging memories help meet diverse resource needs for cloud storage apps across several dimensions? Non-volatile Lower latencies w.r.t SSD Larger and flexible: capacity and latency Volatile Persistent Low capacity High capacity 3D XPoint Low latency High latency + - + - Compressed Battery Backed DRAM Latency Can we exploit these emerging memory technologies to overcome Cost Capacity drawbacks of existing resources? 7

  8. What are the design choices to integrate emerging memory technologies in cloud servers? Persistent memory NVM based file Transparent cache programming systems (memory or storage) Benefit volatile and No changes to applications Low cost and transparent persistent accesses High NVM provisioning cost for entire storage Intrusive code changes Benefits reads or writes needs! to applications! and not both! Intrusive code changes to OS and FS! 8

  9. Emerging memory technologies are polymorphic They can function as both memory and storage! persistent Persistent write Volatile memory cache cache Memory Transparent extension (direct write cache above access via loads, SSD (via stores) block interface) volatile Can we exploit functional polymorphism knob?

  10. Functional polymorphism can benefit applications with competing volatile and persistent flows dm-cache to use a part of NVM as write cache Rest – additional memory accessible via load/stores MySQL TPC-C 15 Tail latency 10 5 0 0 50 100 % NVM used as write cache Partitioning NVM between memory and storage reduces latency What if the working set exceeds physical memory/write-cache capacity? 10

  11. Impact of insufficient physical capacity + fixed resource characteristics on application performance Application’s working set split between two fixed latency tiers Probability 95 th percentile DRAM SSD Tail latency is determined by the slowest tier Access Latency 11

  12. Representational polymorphism knob to tune latency and capacity Application’s working set split between two fixed latency tiers Probability Faster tier morphs 95 th percentile to hold more working set 95 th percentile DRAM SSD Tail latency reduces Access Latency 12

  13. Representational polymorphism can benefit applications Write Access Read Access 1000 % Increase in capacity 12 Much lower Access Latency (us) 800 2X to 8X 10 latency increase in compare to 8 effective 600 SSD! capacity 6 400 4 2 200 0 0 4096 2048 1024 512 MapReduce SearchServe Compressed Access Granularity (bytes) Our goal: Effectively serve diverse cloud applications using polymorphic emerging memory based cache 13

  14. PolyEMT: Polymorphic Emerging Memory Technology based cache NVM can be Unmodified Application Battery-backed DRAM, 3D-Xpoint, etc. Memory Interface Memory ns Block Interface DRAM NVM 1 us NVM 10 us 100 us SSD Functional Polymorphism: 1 Memory vs. Storage ? Storage Cloud applications are diverse: One partition size does not fit all! 14

  15. PolyEMT: Polymorphic Emerging Memory Technology based cache NVM can be Unmodified Application Battery-backed DRAM, 3D-Xpoint, etc. Memory Interface Memory ns Block Interface DRAM NVM 1 us NVM 2 2 10 us Compressed Compressed 100 us SSD Functional Polymorphism: 1 Memory vs. Storage ? Storage Representational Polymorphism: 2 We need to navigate performance trade-off across capacity, latency, and Capacity vs. Latency ? persistence dimensions! 15

  16. Key idea of PolyEMT cache • Address the most significant bottleneck first using the emerging memory based cache • Then gradually morph its characteristics to further improve performance What is the most significant bottleneck for a generic application with mixes of reads and writes ? 16

  17. Persistent writes (file writes, flushes, msyncs) incurs high latency in existing systems Persistent tier is Use BB-DRAM as And, SSDs are asymmetric in much slower Write-Cache to SSD their read/write latency DRAM DRAM Avg. 95 Read Persistent 5 Read Persistent Misses writes Misses writes 4 Latency (ms) Block File System Block File System 3 BB-DRAM Write Cache SSD SSD 2 SSD SSD writes reads Reads Writes 1 0 SSD SSD Reads Writes 17

  18. EMT entirely in Write-Cache is inefficient usage for read accesses as they are byte addressible As write-cache and Resource is byte addressable! memory extension As write-cache DRAM DRAM BB-DRAM How to apportion Read Persistent Persistent Read NVM capacity Misses writes writes Misses between memory and Block File System Block File System Storage functions? BB-DRAM Write Cache BB-DRAM Write Cache SSD SSD SSD SSD Reads Writes Reads Writes SSD SSD 18

  19. Tuning write-cache capacity in the presence of competing read and write flows Persistent Write 1 Latency 0 0 75 50 25 100 % BB-DRAM in Storage 19

  20. Tuning write-cache capacity in the presence of competing read and write flows % EMT in Memory 0 50 25 75 100 Volatile Latency Persistent Write 1 1 Latency 0 0 0 75 50 25 100 % EMT in Storage 20

  21. Balance the overall impact of read and write accesses % EMT in Memory % EMT in Memory Application Performance 0 50 25 75 0 100 25 50 75 100 Volatile Latency Persistent Write 1 1 1 Latency 0 0 0 0 50 25 0 100 75 75 50 25 100 % EMT in Storage % EMT in Storage Incrementally repurpose Write-Cache blocks as memory pages to balance read/write performance. 21

  22. When the physical capacity is insufficient, exploit representational polymorphism Functional + Representational Functional polymorphic cache polymorphic cache DRAM BB-DRAM DRAM BB-DRAM Compressed-BB-DRAM Persistent Read Read Persistent writes Misses Misses writes Block File System Block File System BB-DRAM Write Cache BB-DRAM Write Cache Compressed BB-DRAM SSD SSD SSD SSD Reads Writes Reads Writes SSD SSD No latency benefits by separating memory and storage functions! 22

  23. When the physical capacity is insufficient, exploit representational polymorphism Shared compressed Functional + Representational Functional representation polymorphic cache polymorphic cache DRAM BB-DRAM DRAM BB-DRAM DRAM BB-DRAM Read Persistent Compressed-BB-DRAM Persistent Read Misses writes Read Persistent writes Misses Block File System Misses writes Battery-backed DRAM Block File System Block File System Shared-Compressed BB-DRAM Write Cache BB-DRAM Write Cache BB-DRAM Compressed BB-DRAM SSD SSD SSD SSD SSD SSD Reads Writes Reads Writes Reads Writes SSD SSD SSD No latency benefits by separating Shared compression layer reduces memory and storage functions! compute requirements too! 23

  24. PolyEMT optimization steps at a glance On scheduling a new application 1. EMT as persistent Write-Back Cache 4. LRU based 2. Exploit functional polymorphism capacity On dynamic phase management changes within an application 3. Exploit representational polymorphism 24

  25. PolyEMT prototype • PolyEMT library and runtime • mmap(): native load/store access • msync(): persist dirty data to NVM write cache persist data to SSD in background • More implementation details in the paper 25

  26. Evaluation Setup • Azure VM • DRAM (26GB) • Battery Backed –DRAM (6GB) • SSD • CPU based compression • Redis Key-Value store with persistence capability • Data set size: • 38GB much higher than DRAM+BB-DRAM capacity • YCSB benchmarks 26

  27. Transparent integration policies under evaluation • Dram-Extension • Write-Cache • Write-Cache + Functional polymorphism • Write-Cache + Functional polymorphism + Representational polymorphism

  28. Performance benefits of PolyEMT on throughput Write-Cache Functional Functional+Representational Normalized Throughput wrt DRAM-Extension 8 6 5X 4.55X 4 2.5X 2 0 a b c d e f Mean Addressing the most significant bottleneck improves performance by 2.5X Exploiting polymorphisms further improves performance by 70% and 90%

Recommend


More recommend