storage class memory
play

Storage Class Memory Towards a disruptively low-cost solid-state - PowerPoint PPT Presentation

Storage Class Memory Towards a disruptively low-cost solid-state non-volatile memory Science & Technology Almaden Research Center January 2013 Storage Class Memory Power & space in the server room The cache/memory/storage hierarchy


  1. Storage Class Memory Towards a disruptively low-cost solid-state non-volatile memory Science & Technology Almaden Research Center January 2013

  2. Storage Class Memory Power & space in the server room The cache/memory/storage hierarchy is rapidly becoming the bottleneck for large systems . We know how to create MIPS & MFLOPS cheaply and in abundance, but feeding them with data has become the performance-limiting and most-expensive part of a system (in both $ and Watts ). Extrapolation to 2020 (at 70% CGR  need 2 GI OP/ sec ) • 5 million HDD  16,500 sq. ft. !!  22 Mega watts R. Freitas and W. Wilcke, Storage Class Memory: the next storage system technology – "Storage Technologies & Systems" special issue of the IBM Journal of R&D (2008) 2 Science & Technology – IBM Almaden Research Center Jan 2013

  3. Storage Class Memory …yet critical applications are also undergoing a paradigm shift Compute -centric Data -centric paradigm paradigm Main Focus: Analyze petabytes of data Solve differential equations Storage & I / O Bottleneck: CPU / Memory Typical Examples: Computational Fluid Dynamics Search and Mining Analyses of social/terrorist networks Finite Element Analysis Sensor network processing Multi-body Simulations Digital media creation/transmission Environmental & economic modeling Extrapolation (at 90% CGR  need 1.7 PB/ sec ) (at 90% CGR  need 8.4G SI O/ sec ) • 21 million HDD • 5.6 million HDD to 2020  70,000 sq. ft. !!  19,000 sq. ft. !!  93 Mega watts  25 Mega watts [Freitas:2008] 3 Science & Technology – IBM Almaden Research Center Jan 2013

  4. Storage Class Memory Problem (& opportunity) : The access-time gap between memory & storage 1980 ...(in human Access time ... perspective) ( in ns) (T x 10 9 ) CPU 1 CPU operations (1ns) second ON-chip Decreasing Get data from L2 cache (<5ns) 10 memory co$t Get data from DRAM/SCM (60ns) minute RAM 100 OFF-chip memory 10 3 hour ON-line 10 4 storage 10 5 day OFF-line week 10 6 storage month Read or write to DISK (5ms) DISK 10 7 year 10 8 decade 10 9 century 10 10 Get data from TAPE (40s) millenium TAPE • Modern computer systems have long had to be designed around hiding the access gap between memory and storage  caching, threads, predictive branching, etc. • “Human perspective” – if a CPU instruction is analogous to a 1-second decision by a human, retrieval of data from off-line tape represents an analogous delay of 1250 years 4 Science & Technology – IBM Almaden Research Center Jan 2013

  5. Storage Class Memory Problem (& opportunity) : The access-time gap between memory & storage 1980 Today Access time ... ( in ns) 1 CPU operations (1ns) CPU CPU ON-chip Decreasing Get data from L2 cache (<5ns) 10 memory co$t Get data from DRAM/SCM (60ns) 100 OFF-chip RAM RAM memory Memory/storage gap 10 3 ON-line 10 4 Read a FLASH device (20 us) storage FLASH 10 5 SSD OFF-line 10 6 Write to FLASH, random (1ms) storage Read or write to DISK (5ms) 10 7 DISK DISK 10 8 10 9 10 10 Get data from TAPE (40s) TAPE TAPE • Today, Solid-State Disks based on NAND Flash can offer fast ON-line storage, and storage capacities are increasing as devices scale down to smaller dimensions… …but while prices are dropping, the performance gap between memory and storage remains significant, and the already- poor device endurance of Flash is getting worse. 5 Science & Technology – IBM Almaden Research Center Jan 2013

  6. Storage Class Memory Problem (& opportunity) : The access-time gap between memory & storage Near-future Access time ... ( in ns) 1 CPU operations (1ns) CPU ON-chip Decreasing Get data from L2 cache (<5ns) 10 memory co$t Get data from DRAM/SCM (60ns) 100 OFF-chip RAM memory Memory/storage gap 10 3 ON-line 10 4 Read a FLASH device (20 us) storage SCM 10 5 OFF-line 10 6 Write to FLASH, random (1ms) storage Read or write to DISK (5ms) 10 7 DISK 10 8 10 9 10 10 Get data from TAPE (40s) TAPE Research into new solid-state non-volatile memory candidates – originally motivated by finding a “successor” for NAND Flash – has opened up several interesting ways to change the memory/storage hierarchy… 1) Embedded Non-Volatile Memory – low-density, fast ON-chip NVM 2) Embedded Storage – low density, slower ON-chip storage 3) M-type Storage Class Memory – high-density , fast OFF- (or ON* )-chip NVM 4) S-type Storage Class Memory – high-density , very-near-ON-line storage * ON-chip using 3-D packaging 6 Science & Technology – IBM Almaden Research Center Jan 2013

  7. Storage Class Memory Storage-type vs. memory-type Storage Class Memory Speed (Latency & Bandwidth) 100  s Read Latency NAND Storage-type Memory-type uses uses 10  s Power! (Write) Cost /bit 1  s Endurance F F 4F 2 100ns DRAM Cell size [F 2 ] 10ns 2 4 6 8 10 low co$t The cost basis of semiconductor processing is well understood – the paths to higher density are 2) storing more bits PER 4F 2 1) shrinking the minimum lithographic pitch F , and 7 Science & Technology – IBM Almaden Research Center Jan 2013

  8. Storage Class Memory M-type: Synchronous S-type vs. M-type SCM • Hardware managed • Low overhead • Processor waits Internal • New NVM  not Flash CPU • Cached or pooled memory DRAM • Persistence (data survives despite Memory component failure or loss of power) requires Controller redundancy in system architecture I/O SCM Controller ~ 1us read latency SCM S-type: Asynchronous • Software managed • High overhead SCM • Processor doesn’t wait, Storage (process-, thread-switching) Controller • Flash or new NVM Disk • Paging or storage External • Persistence  RAID 8 Science & Technology – IBM Almaden Research Center Jan 2013

  9. Storage Class Memory Competitive Outlook among emerging NVMs Future NAND applications (consumer devices, etc.) Future NOR applications • 3-D NAND (but crossover to succeed 20nm (program code, etc.) conventional NAND may require > 50 layers!) • PCM (but market disappearing) • PCM?/ RRAM? High Speed Embedded Storage S-type Storage Class Memory (low density, slower ON-chip storage) ( high-density , very-near-ON-line storage) • NAND? (but complicated process) 1) PCM?/ RRAM? • RRAM?/ PCM? 2) Racetrack? (future?) M-type Storage Class Memory ( high-density , fast OFF- (or ON* )-chip NVM) • CBRAM? STT-RAM? Embedded Non-Volatile Memory • PCM?/ RRAM? (low-density, fast ON-chip NVM) • Racetrack? (future?) • STT-RAM? CBRAM? * ON-chip using 3-D packaging Low co$t 9 Science & Technology – IBM Almaden Research Center Jan 2013

  10. Storage Class Memory Device Paths towards SCM Availability Co$t  Capital investment Applications 3-D NAND Future NAND applications NAND (consumer devices, etc.) unlikely, but possible path 1-10us Embedded Storage S-type SCM emerging NVM ( high-density , (low density, RRAM? PCM? slower ON-chip storage) near-ON-line storage) CBRAM? ¿ 1us M-type SCM Embedded emerging NVM ( high-density , Non-Volatile Memory fast OFF-(or ON* ) STT-RAM? CBRAM? (low-density, fast ON-chip NVM) -chip NVM) PCM??/ RRAM?? * ON-chip using 3-D packaging Future DRAM DRAM (working memory, etc.) 10 Science & Technology – IBM Almaden Research Center Jan 2013

  11. Storage Class Memory NVM candidates for SCM 1) NVM element • I mproved FLASH • Magnetic Spin Torque Transfer  STT-RAM  Magnetic Racetrack • Phase Change RAM • Resistive RAM 2) High-density access device (A.D.) • 2-D – silicon transistor or diode • 3-D  higher density per 4F 2 NVM memory element • polysilicon diode (but < 400 o C processing?) plus access device • MIEC A.D. (Mixed Ionic-Electronic Conduction) Generic SCM Array • OTS A.D. (Ovonic Threshold Switch) • Conductive oxide tunnel barrier A.D. 11 Science & Technology – IBM Almaden Research Center Jan 2013

  12. Storage Class Memory Limitations of Flash 52000 100000 Asymmetric performance 17000 10000 Writes much slower than reads 10000 3000 2000 IOPS 1000 Program/erase cycle Block-based, no write-in-place 49 100 10 Data retention and Non-volatility USB disk LapTop Enterprise Retention gets worse as Flash scales down Maximum Random Read IOPs Maximum Random Write IOPs Endurance • Single level cell (SLC)  10 5 writes/cell 1000 • Multi level cell (MLC)  10 4 writes/cell 200 100 • Triple level cell (TLC)  ~300 writes/cell 60 100 40 MB/s 17 7 Future outlook 10 • Scaling focussed solely on density 1 • 3-D schemes exist but are complex USB disk LapTop Enterprise Sustained Read Bandwidth Sustained Write Bandwidth 12 Science & Technology – IBM Almaden Research Center Jan 2013

Recommend


More recommend