Homogenous 64 Main Memory Combinations Data Safe Performance Capacity DRAM No Best 1.0 X NVDIMM-N Yes Best 0.5 X Optane Yes Worst 10 X NVDIMM-P Yes Mid 10 X MCS Yes Best+ 1 X+
Heterogeneous 65 Main Memory Combinations Data Safe Performance Capacity DRAM + Optane No High 6 X DRAM + NVDIMM-P No High 6 X MCS + Optane Yes High 6 X MCS + NVDIMM-P Yes High 6 X
Heterogeneous 66 Homogenous Main Memory Combinations Main Memory Combinations Software encouraged to Software need not care put critical functions in faster memory All functions take the same time Often mount slower memory as RAM drive
67 Software support via DAX assists in moving… from mounted drives… …to RAM drive… …to direct access mode
68 The P Power of Zero Power
69 Putting a Node to Sleep Operating Self Refresh Mode Mode Instant On means power must stay alive Refresh operations burn significant power
70 Memory Class Storage can be turned off entirely Operating Power 33 Mode Off
71 Memory Module (DIMM) Module DDR5 memory modules Power Memory Media have on-DIMM voltage regulation (PMIC) PMIC Data Buffers DIMM power may be shut off independently System Power of system power System Motherboard
DIMM1 DIMM2 72 Module Module Power Power Memory Media Memory Media PMIC PMIC Data Buffers Data Buffers System Power System power off; both DIMMs off Multiple power System power on & both DIMMs off management options System power on & DIMM1 on, DIMM2 off
73 Nantero NRAM™ My favorite NVRAM Full presentation on Wednesday…
74 ELECTRODE ELECTRODE Van der Waals energy barrier keeps CNTs apart or together Data retention >300 years @ 300 ֯ C, >12,000 years @ 105 ֯ C Stochastic array of hundreds nanotubes per each cell
75 No temperature sensitivity 5 ns balanced read/write performance
76 NRAM Data Retention = 12,000 Years 10,000 years ago 4,500 years ago 2,500 years ago
X 77 64 Kb tile Y NRAM LAYER X 256 K tiles Z = 16 Gb Drivers Receivers Array size tuned to the size of drivers & receivers Chip-level timing is a function of bit line flight times Replicate this “tile” as needed for device capacity Add I/O drivers to emulate any PHY needed I/O PHY
78 Carbon Nanotube Arrays DDR4, DDR5 NRAM 72 bits Row Column Bank Decode Decode Decode SECDED ECC Engine Address 64 bits FIFO FIFO Chip ID Die Selector x4/x8 Data Strobe Strobe Data
79 15-20% DDR4/DDR5 Architectural improvements DDR4/DDR5 NRAM improve data throughput 15% or greater at the same Base throughput clock frequency Elimination of refresh Elimination of tFAW restrictions Elimination of bank group restrictions Elimination of power states Elimination of inter-die delays Bandwidth: larger is better
80 NVRAM Memory Class Storage NRAM NRAM NRAM NRAM NRAM NRAM NRAM NRAM NRAM NRAM NRAM NRAM NRAM NRAM NRAM NRAM NRAM NRAM Plugs into an RDIMM slot Appears to the CPU as DRAM Memory controller may optionally be tuned for NVRAM
81 One less layer of marshmallows to deal with Persistence Persistence Non- deterministic Fully deterministic
82
Would you rather… 83 Step on broken glass? Know Your Enemy A LEGO? Or some jacks?
84 …about those energy stores… Batteries Supercapacitors Tantalums (etc.)
85 Batteries Supercapacitors Tantalums (etc.) High capacity Medium capacity Low capacity High energy density Low energy density Low energy density Low reliability Degrade over time …but stable
86 Energy needed for backup of DRAM cache Flash or Storage Energy Storage Class DRAM Controller Memory I/O
More room 87 for storage X Eliminate need for backup energy X Flash or Storage NVRAM Energy Storage Class Controller Memory I/O
88 NVRAM Changes the Math DRAM cache limited by 1GB/TB energy available No DRAM? Cache size dictated by cost/performance
89 Switching gears again… …to Systems Evolution
90 Pop quiz How many CPUs in a 1980s PC?
91 Modem One? Graphics Network Adapter Adapter Sound Blaster
They were killed by 92 “Native Signal Processing” They were called “DSPs” Digital Signal Processors Drivers They put processing next to the data Analog front end devices
93 $ $ With NSP… $ W W W So why do it?
Now We Are 94 Trending Back FPGA CPU Memory Storage Storage Fabric Memory CPU AI
95
Processor 96 Elements Processor Processor Elements Elements Bridge Low I/O I/O Latency Elements Elements Fabric Bridge Storage Storage Elements Elements Storage Elements Bridge
97 Distributed resources Application-specific computing In-memory computing Artificial intelligence and deep learning Security
98 Graphics Accelerator Search Engine Human interface Standard CPU HTML processing Artificial Intelligence Accelerator Human interface management Low Latency Fabric Network Adapter Memory Array Filesystem Aware Storage
Example AI accelerator 99 Tbps links I/O NNP Control Exec HBM SRAM SIMD architectures Unit Matrix interconnections Exec HBM SRAM Fast pipes still limit load/save time Unit Challenges: … HBM • Model checkpointing • Data loss on power fail Exec HBM SRAM Unit • Temperature sensitivity
I/O 100 Back propagation algorithms complicate things Data loss problems are amplified Checkpointing highly time and bandwidth consuming
Recommend
More recommend