the future of
play

The future of graphic and mobile memory for new applications August - PowerPoint PPT Presentation

The future of graphic and mobile memory for new applications August 21 st , 2016 l JIN KIM l Samsung Electronics Disclaimer This presentation is intended to provide information concerning memory industry. We do our best to make sure that


  1. The future of graphic and mobile memory for new applications August 21 st , 2016 l JIN KIM l Samsung Electronics

  2. Disclaimer This presentation is intended to provide information concerning memory industry. We do our best to make sure that information presented is accurate and fully up-to-date. However, the presentation may be subject to technical inaccuracies, information that is not up-to-date or typographical errors. As a consequence, Samsung does not in any way guarantee the accuracy or completeness of information provided on this presentation. Samsung reserves the right to make improvements, corrections and/or changes to this presentation at any time. The information in this presentation or accompanying oral statements may include forward-looking statements. These forward-looking statements include all matters that are not historical facts, statements regarding the Samsung Electronics' intentions, beliefs or current expectations concerning, among other things, market prospects, growth, strategies, and the industry in which Samsung operates. By their nature, forward-looking statements involve risks and uncertainties, because they relate to events and depend on circumstances that may or may not occur in the future. Samsung cautions you that forward looking statements are not guarantees of future performance and that the actual developments of Samsung, the market, or industry in which Samsung operates may differ materially from those made or suggested by the forward-looking statements contained in this presentation or in the accompanying oral statements. In addition, even if the information contained herein or the oral statements are shown to be accurate, those developments may not be indicative developments in future periods. 2/24

  3. Contents • Memory technology trend • High speed graphic technology ( >10Gbps) • Low power mobile technology ( >20%) • Conclusion 3/24

  4. Memory technology trend 4/24

  5. Memory is at the core of new applications Higher Performance Autonomous Lower Power 1 256GB/s x0.5 Power Virtual Memory Reality 0.7 Efficiency x10 -Centric Artificial Computing Bandwidth 0.5 Intelligence 30GB/s Computer Vision GDDR5 HBM2 LP3 LP4 LP4X Source: Samsung 5/24

  6. Memory-centric system evolution • Extreme B/W, performance/power, data processing, cost effective solutions Efficiency (Perform./Power, Cost) A.I., VR/MR, Vision DDR5/LP5/GDDR6 lower power noise immune Data Traffic, high speed Cost, Thermal Memory Memory Wall Evolution Value, UX Perform extension off-loading customized Multi-Core processing Low Cost HBM/PIM Core Clock PC/Server, Mobile, Gfx SoC Time 6/24

  7. Memory technology trend • GDDR6 with over 14Gbps, beyond 10Gbps GDDR5 • LP5, 20% more power-efficient than LP4X Power Efficiency Performance [mW/GBps] [Gbps/pin] 100% 15 DDR3 GDDR6 12 80% LP3 DDR4 GDDR5 60% 9 GDDR5 LP4 LP5 DDR5 6 40% LP4X GDDR6 LP4X LP4 20% 3 DDR5 LP5 DDR4 2016 2018 2020 2016 2018 2020 Source: ISCA2016, Samsung 7/24

  8. High Bandwidth Memory: HBM Benefits High Bandwidth 1TB/s 8H stacked 20nm 8GB HBM Performance TSV Technology HBM X 2.7 GDDR5 HBM Microbump DRAM Logic Processor Buffer Si Interposer Power Efficiency PCB 1,024 I/O Architecture X 0.8 HBM GDDR5 Source: Samsung 8/24

  9. Processing In Memory: PIM • Fill the performance gap and deliver energy-efficient solutions Processing In-Memory Better parallelism and lower bus traffic GPU/VPU AP DRAM CPU Processing In DRAM Processing In Buffer Memory off-loading for lower frequency and power Source: Samsung 9/24

  10. High speed graphic technology ( >10Gbps) • Graphic application requirement • Asymmetric System, Crosstalk, EQ tuning • GDDR6, Low cost HBM, PIM 10/24

  11. High speed memory requirement • For 4K real infographic virtual reality, 13.2GB, 1TB/s memory needed • For 4K 3D mixed reality, +3.5GB, 151GB/s memory needed Gaming Virtual Reality memory Mixed Reality memory 23.6 11.6 [ Added Capacity, GB ] [ Gfx Capacity, GB ] 9.0 13.2 13 Main H/E Main H/E 8 3.5 6 2.7 2 1.6 1.0 QHD 4K UHD 8K UHD QHD 4K UHD 8K UHD 3,640 3,216 791 [ B/W, GB/s ] [ B/W, GB/s ] 527 1064 151 462 101 215 90 42 28 QHD 4K UHD 8K UHD QHD 4K UHD 8K UHD Source: Samsung Variable Assumption Poly count, fps, # of texture per fragment, cache hit rate, tri-linear filtered, 11/24 # of virtual light source, Reflection/refraction ratio, ray bounce depth

  12. Asymmetric system for higher data rate • Focus on the respectively dedicated features to maximize data rate ‒ Smart GPU : Training (Per-bit Timing/EQ) for minimizing static offset/noise ‒ Noise immune DRAM : minimizing dynamic noise (Jitter, ISI/x-talk, clock duty/skew) Noise immune Training(Timing/EQ) circuit/PKG Board/PKG SI/PI CA[0:9] DRAM CMD/AMD D Q D Q Core CK_t Jitter CK_c ISI PLL/DLL Data Tx/Rx X-talk D Q WCK_t WCK_c To EDC pin Phase D Q Detector DQ[0:7] Calibration data DRAM D Q DQ CTLE Core D Q Clock Phase controller GPU DRAM Source: Samsung 12/24

  13. X-talk reduction for Board/PKG design • Small X-talk Package : reduction of X-talk with better return path • Crosstalk Reduction with coding : 3B4B, 8B9B Small X-talk PKG requirement 3B4B encoding GDDR5 Crosstalk Reduction ICR: Insertion loss to Crosstalk Ratio Source: Samsung 13/24

  14. DFE for return-loss reduction on system • Single ended signaling requires noise immune equalizer ‒ DFE* is more suitable than CTLE** CTLE & DFE Quarter rate DFE with summer in sampler EQ FIFO RX 8GHz WCK/WCKB 4 DQ CLK buffer 4 /2 4GHz 4 TX MUX FIFO CTLE and DFE Adopt merged summer/sampler for fast feedback Periodically Calibrated by GPU * Decision Feedback Equalization ** Continuous Time Linear Equalization Source: Samsung 14/24

  15. GDDR6 ideas • High Speed Signaling, 14Gbps ~ 16Gbps, 1.35V ‒ Low jitter clocking with WCK/byte, Per-bit RX/TX equalizer training, X-talk reduction ‒ 2 channel with BL16, same Clock/ADD freq., twice of WCK/DQ freq. WCK Clocking Target Timing RD WR CK : 1.75Gbps GPU DRAM 7GHz CMD : 1.75Gbps ~8GHz GDDR5 WCK ADDR : 3.5Gbps WCK tree WCK : 3.5Gbps Word  Byte DQ : 7Gbps TX DQ CK : 1.75Gbps 14Gbps CA : 3.5Gbps RX GDDR6 ~16Gbps GPLL WCK : 7Gbps Noise immune DRAM DQ : 14~Gbps Source: Samsung 15/24

  16. Low cost HBM for consumer segment • ~ 200GBps with smaller # of TSV compared to HBM2 ‒ Cost competitiveness ; remove buffer die, reduce # of TSV, organic interposer, etc.. ‒ Need inputs from Client segment for specific features Challenge for HBM Comparison HBM DRAM HBM2 Low cost HBM 3 4 Logic Processor Buffer 2 1 1024 ~512 I/O Si Interposer 5 PCB 2Gbps 3Gbps ~ Pin speed Challenges 256 ~ 200 BW (GB/s) 1. IO reduction, Smaller # of TSV 2. Remove buffer die Cost/GB 1 0.X 3. Master/Slave structure 4. Remove ECC 5. Si or organic Interposer Source: Samsung 16/24

  17. PIM, Deep Learning in DRAM • Parallel processing in buffer to reduce extreme-bandwidth ‒ convolution, subsampling, matrix calculation • Collaborate with accelerator for performance/cost Extreme B/W Requirement Processing in Buffer Data movement reduction CPU GPGPU CPU + DRAM GPU+HBM/GDDRx HBM/GDDRx Mem Accelerator DRAM CPU Accelerator Accelerator DRAM X10 X10 CPU + (# of core) (# of core) DRAM Acc.s+xHBMs* Mem xHBM xHBM xHBM xHBM Deep Learning Convolution / Subsampling In Buffer * xHBM: Extreme HBM 17/24

  18. Low power mobile technology ( >20%) • Motivation for low power mobile • LP4X / LP5 • PIM 18/24

  19. Motivation for low power mobile • PC-level graphic performance and mobile power budget • Power is continuously increasing with limited thermal budget Performance vs. TDP Power Dissipation Trend Dynamic Power Desktop Power Gap 10 Static Power 5K Power Dissipation [W] Notebook GFLOPS (GPU) 4K 1 Thermal Limit (hand-held device) Oculus Rift PC Graphic (+GTX Card) Lower Power 3K Performance design 10 -1 2K 1K [Year] TDP [Watt] Mobile 10 -2 ‘00 ‘05 ‘10 ‘15 ‘20 0 100 200 300 *TDP(Thermal Design Power) Source: Samsung 19/24

  20. Lower power solution, LP4X • LP4X : 4266Mbps, VDDQ/VDD = 0.6V/1.1V ‒ IO power reduction with 0.6V VDDQ, Good example of small change but big gain LP4X Power Reduction LP4X Idea 1.1V 18% Total Power Saving!!!! 1-UI V DDQ (=1.1V) LP4 V OH =V DDQ /3 -45% V REF =V OH /2 V O MN UP GND IO Pre-driver V O0 V OH CHANNEL DQ Same Swing Core R term Same VOH MN DW Half-level VDDQ V DDQL (=0.6V) 0.6V 1-UI V DDQ (=1.0V) from AP LP4X V OH =V DDQ /2 V REF =V OH /2 V O MN UP GND Pre-driver V O0 V O CHANNEL DQ R term MN DW LP4 3200 LP4 3733 LP4 4266 LP4X 4266 • Conditions : IDD4R(VDDQ+VDD2) Spec Value / 50% Data change each burst transfer / Included process node contribution Source: Samsung 20/24

Recommend


More recommend