expanding t the w world o of h heterogenous mem emory h y
play

Expanding t the W World o of H Heterogenous Mem emory H y Hier - PowerPoint PPT Presentation

Expanding t the W World o of H Heterogenous Mem emory H y Hier erarchies The Evolving Non-Volatile Memory Story Bill Gervasi Principal Systems Architect 16 May 2019 2 Data Memory Checkpointing Processing Tiers Challenges


  1. Homogenous 64 Main Memory Combinations Data Safe Performance Capacity DRAM No Best 1.0 X NVDIMM-N Yes Best 0.5 X Optane Yes Worst 10 X NVDIMM-P Yes Mid 10 X MCS Yes Best+ 1 X+

  2. Heterogeneous 65 Main Memory Combinations Data Safe Performance Capacity DRAM + Optane No High 6 X DRAM + NVDIMM-P No High 6 X MCS + Optane Yes High 6 X MCS + NVDIMM-P Yes High 6 X

  3. Heterogeneous 66 Homogenous Main Memory Combinations Main Memory Combinations Software encouraged to Software need not care put critical functions in faster memory All functions take the same time Often mount slower memory as RAM drive

  4. 67 Software support via DAX assists in moving… from mounted drives… …to RAM drive… …to direct access mode

  5. 68 The P Power of Zero Power

  6. 69 Putting a Node to Sleep Operating Self Refresh Mode Mode Instant On means power must stay alive Refresh operations burn significant power

  7. 70 Memory Class Storage can be turned off entirely Operating Power 33 Mode Off

  8. 71 Memory Module (DIMM) Module DDR5 memory modules Power Memory Media have on-DIMM voltage regulation (PMIC) PMIC Data Buffers DIMM power may be shut off independently System Power of system power System Motherboard

  9. DIMM1 DIMM2 72 Module Module Power Power Memory Media Memory Media PMIC PMIC Data Buffers Data Buffers System Power System power off; both DIMMs off Multiple power System power on & both DIMMs off management options System power on & DIMM1 on, DIMM2 off

  10. 73 Nantero NRAM™ My favorite NVRAM Full presentation on Wednesday…

  11. 74 ELECTRODE ELECTRODE Van der Waals energy barrier keeps CNTs apart or together Data retention >300 years @ 300 ֯ C, >12,000 years @ 105 ֯ C Stochastic array of hundreds nanotubes per each cell

  12. 75 No temperature sensitivity 5 ns balanced read/write performance

  13. 76 NRAM Data Retention = 12,000 Years 10,000 years ago 4,500 years ago 2,500 years ago

  14. X 77 64 Kb tile Y NRAM LAYER X 256 K tiles Z = 16 Gb Drivers Receivers Array size tuned to the size of drivers & receivers Chip-level timing is a function of bit line flight times Replicate this “tile” as needed for device capacity Add I/O drivers to emulate any PHY needed I/O PHY

  15. 78 Carbon Nanotube Arrays DDR4, DDR5 NRAM 72 bits Row Column Bank Decode Decode Decode SECDED ECC Engine Address 64 bits FIFO FIFO Chip ID Die Selector x4/x8 Data Strobe Strobe Data

  16. 79 15-20% DDR4/DDR5 Architectural improvements DDR4/DDR5 NRAM improve data throughput 15% or greater at the same Base throughput clock frequency Elimination of refresh Elimination of tFAW restrictions Elimination of bank group restrictions Elimination of power states Elimination of inter-die delays Bandwidth: larger is better

  17. 80 NVRAM Memory Class Storage NRAM NRAM NRAM NRAM NRAM NRAM NRAM NRAM NRAM NRAM NRAM NRAM NRAM NRAM NRAM NRAM NRAM NRAM Plugs into an RDIMM slot Appears to the CPU as DRAM Memory controller may optionally be tuned for NVRAM

  18. 81 One less layer of marshmallows to deal with Persistence Persistence Non- deterministic Fully deterministic

  19. 82

  20. Would you rather… 83 Step on broken glass? Know Your Enemy A LEGO? Or some jacks?

  21. 84 …about those energy stores… Batteries Supercapacitors Tantalums (etc.)

  22. 85 Batteries Supercapacitors Tantalums (etc.) High capacity Medium capacity Low capacity High energy density Low energy density Low energy density Low reliability Degrade over time …but stable

  23. 86 Energy needed for backup of DRAM cache Flash or Storage Energy Storage Class DRAM Controller Memory I/O

  24. More room 87 for storage X Eliminate need for backup energy X Flash or Storage NVRAM Energy Storage Class Controller Memory I/O

  25. 88 NVRAM Changes the Math DRAM cache limited by 1GB/TB energy available No DRAM? Cache size dictated by cost/performance

  26. 89 Switching gears again… …to Systems Evolution

  27. 90 Pop quiz How many CPUs in a 1980s PC?

  28. 91 Modem One? Graphics Network Adapter Adapter Sound Blaster

  29. They were killed by 92 “Native Signal Processing” They were called “DSPs” Digital Signal Processors Drivers They put processing next to the data Analog front end devices

  30. 93 $ $ With NSP… $ W W W So why do it?

  31. Now We Are 94 Trending Back FPGA CPU Memory Storage Storage Fabric Memory CPU AI

  32. 95

  33. Processor 96 Elements Processor Processor Elements Elements Bridge Low I/O I/O Latency Elements Elements Fabric Bridge Storage Storage Elements Elements Storage Elements Bridge

  34. 97 Distributed resources Application-specific computing In-memory computing Artificial intelligence and deep learning Security

  35. 98 Graphics Accelerator Search Engine Human interface Standard CPU HTML processing Artificial Intelligence Accelerator Human interface management Low Latency Fabric Network Adapter Memory Array Filesystem Aware Storage

  36. Example AI accelerator 99 Tbps links I/O NNP Control Exec HBM SRAM SIMD architectures Unit Matrix interconnections Exec HBM SRAM Fast pipes still limit load/save time Unit Challenges: … HBM • Model checkpointing • Data loss on power fail Exec HBM SRAM Unit • Temperature sensitivity

  37. I/O 100 Back propagation algorithms complicate things Data loss problems are amplified Checkpointing highly time and bandwidth consuming

Recommend


More recommend