onyx a prototype phase change memory storage array
play

Onyx: A Prototype Phase-Change Memory Storage Array Ameen Akel * - PowerPoint PPT Presentation

Onyx: A Prototype Phase-Change Memory Storage Array Ameen Akel * Adrian Caulfield, Todor Mollov, Rajesh Gupta, Steven Swanson Non-Volatile Systems Laboratory, Department of Computer Science and Engineering University of California, San Diego *


  1. Onyx: A Prototype Phase-Change Memory Storage Array Ameen Akel * Adrian Caulfield, Todor Mollov, Rajesh Gupta, Steven Swanson Non-Volatile Systems Laboratory, Department of Computer Science and Engineering University of California, San Diego * Now at Micron Technology 1

  2. 4 KB Operation Request Latencies Disk Flash Current PCM Projected PCM 10000 Log Operation Request Latency (us) 1000 100 10 1 0.1 0.01 Write Read 2

  3. Advantages of Studying PCM SSDs • Understand current PCM performance – With current storage infrastructure – Versus other NV tech: e.g. Flash SSDs • PCM performance may differ from simulation – Variance in write latency due to data – Wear-out characteristics • Use real applications to gauge performance • Understand how software should change for PCM • Prepare to integrate future-generation PCM 3

  4. Overview • Motivation • PCM Devices – Technology Overview – Micron P8P Devices • Onyx Architecture – Logical Architecture – PCM DIMMs – Physical Architecture • Performance Analysis • Applications and Conclusions 4

  5. PCM: The Device Level • PCM storage medium: Chalcogenide – Resistance depends on M. Breitwisch et al VLSI '07 molecular phase • Writes – Heaters are attached to the chalcogenide – Current passed through heaters to change phase – Allows bit-alterable writes • Reads – Measure resistance through chalcogenide area – Resistance sensed by ability to sink current 5

  6. PCM: The Device Level • PCM storage medium: Chalcogenide – Resistance depends on XRD-measurements molecular phase • Writes amorph fcc hexagonal – Heaters are attached to the chalcogenide – Current passed through heaters to change phase – Allows bit-alterable writes • Reads – Measure resistance through chalcogenide area – Resistance sensed by ability to sink current M. Wuttig, et. al., FP6 Project CAMELS. 6

  7. fil fi t “ ” fl fl PCM Write Operations in Depth • Material heated to… – > 600 ∘ C then cooled ! # quickly  Amorphous ( ' &' ! – ~ 350 ∘ C then cooled ! . / 0) slowly  Crystalline • Set and reset ! * +, - – Reset – 0 state &' ! – Set – 1 state ) 10 ns 50-150 ns fi 7 − fi sfi • fi Ω ◦ ◦

  8. PCM Projections • Future PCM latency projections * : Operation Latency Read 48 ns Set 150 ns Reset 40 ns • Process node progression: 90, 45, 32, 20, 9 nm *B. C. Lee, et. al. Architecting Phase Change Memory as a Scalable DRAM Alternative. ISCA 2009. 8

  9. P8P PCM • First-generation NOR-flash replacement • Part: NP8P128A13B1760E (P8P) • Process Node: 90 nm • Capacity: 16 MB • Per Device Bandwidth, Latency, Current – Write (64 bytes): 0.5 MB/s, 120 us, 35 mA – Read (16 bytes): 48.6 MB/s, 314 ns, 15 mA • Lifetime: One million writes until first bit error 9

  10. Overview • Motivation • PCM Devices – Technology Overview – Micron P8P Devices • Onyx Architecture – Logical Architecture – PCM DIMMs – Physical Architecture • Performance Analysis • Applications and Conclusions 10

  11. Moneta: SSD for Emulated Fast NVMs Application • DRAM-based NV-SSD File System emulator OS IO Stack • Learn by building Moneta Driver – Hardware – Controller & interconnect DRAM DRAM – Software – Driver, file CPU system, apps • Uses optimized software stack PCIe Moneta – Decreases request latency – Improves request DRAM DRAM DRAM DRAM DRAM DRAM concurrency 11

  12. Onyx: Phase-Change Memory SSD Application • Based on Moneta * File System – Shares hardware OS IO Stack Onyx Driver – Shares software stack • PCM replaces DRAM DRAM DRAM CPU – Uses real PCM – Custom PCM controller PCIe Onyx *A. M. Caulfield, et. al. Moneta: A high- PCM PCM PCM PCM PCM PCM performance storage array architecture for next-generation, non-volatile memories. MICRO 2010 12

  13. Moneta/Onyx Architecture Ring Control 2GB PCM Ring (4 GB/s) Transfer 2GB Buffers PCM Host via Request Scoreboard PIO Queue 2GB DMA PCM Control 2GB Tag Status PCM Registers Host via DMA 13

  14. Onyx PCM Controller • Request Completion – Late Completion – On PCM write completion – Early Completion – On request reception • Start-Gap Wear Leveling * – Low overhead wear leveling (two registers + logic) – Prevents hot spots from wearing out memory – Rotates line in memory every gap interval *M. K. Qureshi, et. al. Enhancing lifetime and security of PCM- based main memory with start-gap wear leveling. MICRO 42. 14

  15. Closer Look at a PCM DIMM • 8 Ranks of 5 PCM devices – 64 data bits + 16 ECC bits – Effectively 16 ranks per memory interface • Shared control and data lines • Capacity: 640 MB / DIMM Address[0:25 ] Device Device Device Device Device 0 1 2 3 4 Data[0:15] Data[16:31] Data[32:47] Data[48:63] Data[64:79] 15

  16. Prototyping Advanced SSDs • Built on RAMP’s BEE3 board – Four FPGAs connected in a ring – Four DIMM slots per FPGA – PCIe 1.1 x8 host connection • System capacity: 10 GB 16

  17. Overview • Motivation • PCM Devices – Technology Overview – Micron P8P Devices • Onyx Architecture – Logical Architecture – PCM DIMMs – Physical Architecture • Performance Analysis • Applications and Conclusions 17

  18. Read Performance Onyx FusionIO Moneta 2000 1800 1600 1400 Bandwidth (MB/s) 1200 1000 800 600 400 200 0 0.5 1 2 4 8 16 32 64 128 256 512 1024 Request Size (KB) 18

  19. Write Performance Onyx-Late Onyx-Early FusionIO Moneta 2000 1800 1600 1400 Bandwidth (MB/s) 1200 1000 800 600 400 200 0 0.5 1 2 4 8 16 32 64 128 256 512 1024 Request Size (KB) 19

  20. BerkeleyDB Performance Onyx FusionIO Moneta 8000 7000 6000 Transactions / Second 5000 4000 3000 2000 1000 0 BTree HashTable BDB Benchmark 20

  21. Potential PCM Applications • As a read cache – First-gen PCM read speeds compete with flash – Next-gen PCM should improve read performance • Replace DRAM in high-performance apps – PCM cost will likely drop below DRAM – Will scale aggressively past DRAM • Outpace flash in high-performance SSDs – Reduces complexity of management – Provides higher-rated lifetime – Saves power, logic, and design time 21

  22. Conclusions • Onyx designed to maximize PCM performance • More improvements possible as PCM scales – Onyx architecture will scale with PCM – Onyx will benefit from faster reads and writes • PCM simplifies SSD management relative to flash and improves small access performance 22

  23. Thank You! Questions? 23

Recommend


More recommend