Emerging NVM Enabled Storage Architecture: From Evolution to - PowerPoint PPT Presentation

Emerging NVM Enabled Storage Architecture: From Evolution to Revolution. Yiran Chen Electrical and Computer Engineering University of Pittsburgh Sponsors: NSF, DARPA, AFRL, and HP Labs 1

Outline • Introduction • Evolution with eNVM: – On‐chip high speed storage; – Off‐chip secondary storage; • Revolution with eNVM: – Memristor‐based neuromorphic accelerator • Conclusion 2

Conventional Memory Scaling 10 4 100 DDR1 200-400 Aspect Ratio A/R 80 10 3 3Å DDR3 Mb/Chip Intrinsic difficulty of charge-based 800-1600 60 5Å 10 2 DDR2 Burj Khalifa com puting and storage! EDO 400-800 7Å 50 40 A/R=6 8Å 9Å 10 1 SDRAM 11Å 20 Mbps 133 T OX 2000 2010 1990 60 50 40 30 20 Technology Node 2012 – 2013 2014 – 2015 2016 – 2017 2018 – 2019 38nm ‐ 32nm 29nm ‐ 22nm 22nm ‐ 16nm 16nm ‐ 14nm M: Stacked MIM M: Stacked MIM M: Stacked MIM M: FBRAM, STT‐RAM, RRAM, PCRAM P: Planar P: Planar, HKMG P: Planar P: Planar A: 6F 2 , bWL A: 6F 2 , bWL A: 6F 2 , bBL, LBL, 1T1C(VFET) A: 4F 2 , 1T, 1T1R, 1TMTJ (VFET) G: poly/SiO 2 G: HKMG G: HKMG G: HKMG C: Si C: Si C: Si C: Si V: 1.35V V: 1.2V V: 1.1V V: ~1V Sources: ASML, ITRS, IMEC, Hynix, IBM 3

Emerging Nonvolatile Memory 4

Memory Technologies Comparison NAND SRAM DRAM STT‐RAM PCRAM ReRAM FLASH N 4 ms >10 y >10 y Data Retention N 4 ms 10y >10 y >10 y >10 y 6 Memory Cell (F 2 ) 120‐140 7‐9 4 8 4 <1 0.1 ms 5‐10 ns 5‐10 ns Read Time 0.2 ns 2 ns 0.1 ms 5‐10 ns 12 ns 5‐10 ns <10 ns <50 ns 1/0.1ms <10 ns Write/Erase Time 70 ps 1 ns 1/0.1 ms <10 ns <50 ns <10 ns Number of Rewrites 10 16 10 16 10 5 10 15 10 8 10 15 Power Consumption Low Low High Low Low High Low Low Low Read/Write Leakage Refresh Power Consumption Leakage Refresh None None None None None None other than R/W Current Current Power Power Source: ITRS ERD workshop presentation by Prof. Y. Chen 5

Challenges: • Identifying the evolutional applications that can – Easily and seamlessly integrated into the current memory hierarchy and computing platform; – Fully leverage the advantages of emerging NVM; – Not be easily replaced by other alternative technology or architecture. • Inventing a revolutionary computing and storage architecture that can – Offer a high‐performance, power efficient, and scalable computing model; – Provide a truly seamless integration between computing and memory. 6

Outline • Introduction • Evolution with eNVM: – On‐chip high speed storage; • STT‐RAM based 3D cache for CPU. • Racetrack based register file for GPU. – Off‐chip secondary storage; • Revolution with eNVM: – Memristor‐based neuromorphic accelerator. • Conclusion 7

STT ‐ RAM based 3D cache Spin ‐ Transfer Torque Random Access Memory Bit‐line Writing ‘1’ Writing ‘0’ Free Layer MgO Layer Reference Layer MTJ Word‐line Magnetic tunneling junction Source‐line 1T ‐ 1MTJ STT ‐ RAM Schematic A scalable technology 8

SRAM vs. MRAM (STT ‐ RAM) 3.66mm 2 SRAM 3.30mm 2 MRAM Area (65nm) Capacity/Bank 128KB 512KB Read latency 2.25ns 2.32ns Write latency 2.26ns 11.02ns Read energy 0.90nJ 0.86nJ Write energy 0.80nJ 5.00nJ Cache configurations Leakage power 2MB (16x128KB) SRAM cache 2.09W 8MB (16x512KB) MRAM cache 0.26W • Pros: Low leakage power, high density. • Cons: Long write latency and large write power 9

STT ‐ RAM based 3D cache • Baseline 3D Architecture – Core Layer + Cache Layers. – NUCA caches with NOC connections. Data Migration Cache Bank Cache Cache Bank Bank Vertical Hop Router R R Layer 2 Cache Cache TSV Bank Bank R R Horizontal Hop Core Layer 1 Cache Controller 10 G. Sun, X. Dong, Y. Xie, J. Li, Y. Chen , HPCA, 2009.

STT ‐ RAM based 3D cache • Challenges: long write latency of STT‐RAM. • Solution 1 (S1): Read‐Preemptive Write Buffer. Write is alm ost done. Write just begins. Write Buffer (FIFO) Write Req. Write Op. Read Op. Read Data STT-RAM Caches Read Req. Read Op. Cores Read Data 11

STT ‐ RAM based 3D cache • Solution S2: SRAM‐MRAM Hybrid L2 Cache 31-Way STT-RAM & 32-Way STT-RAM 1-Way SRAM MRAM Bank TSV Core Core Core Core SRAM Bank Core Core Core Core 12

STT ‐ RAM based 3D cache • Result (S1 & S2): – Performance is improved by 4.91% compared with STT‐RAM baseline. – Power consumption is reduced by 73.5%. 2M-SRAM-DNUCA 8M-MRAM-DNUCA 8M Hybrid DNUCA 1 1 0.8 0.8 0.6 Power IPC 0.6 0.4 0.4 0.2 0.2 0 0 13

Outline • Introduction • Evolution with eNVM: – On‐chip high speed storage; • STT‐RAM based 3D cache for CPU. • Racetrack based register file for GPU. – Off‐chip secondary storage; • Revolution with eNVM: – Memristor‐based neuromorphic accelerator. • Conclusion 14

Racetrack for GPU • Racetrack • Racetrack cell: BL WWL RWL SL Reference layer Free layer Pinning layer Pinning layer – Two fixed pinning regions: free region, and fix region – Racetrack‐magnetic track – Write `0’ – Inject current to move cell – Write `1’ – Access port – Read 15

Racetrack for GPU • Benefits from Racetrack: – Extremely small cell size; Warp 0 Warp 0 • Major challenges: Arbitrator Shift Controller – Shifting caused delay/energy. Write/Read/Shifter Driver • Warp register remapping (WRR) – 60.0% RF are allocated during WWL the execution Row Decoder RWL – Non‐optimal warp register …... mapping, max shift distance WWL —8‐cell RWL – WRR, interleaves the warp BL BL BL BL SL SL SL SL registers across the access ports, max shift distance—4‐cell Column Mux Sense Amplifier Arrays 16 M. Mao, W. Wen, Y. Zhang, Y. Chen , H. Li, DAC 2014

Racetrack for GPU • Write buffer – “piggyback‐write” to write back to RF from write buffer; 3 – Rely on the track movement 2 6 triggered by the read requests; 4 8 1 – Positive side‐effect: filter the 5 redundant RF R/W by leveraging RAW and WAW. 9 7 17 To EXE/MEM

Racetrack for GPU • Experiment results: – Baseline: SRAM‐based register files. – Energy reduction: 59%. – Performance improvement: 4%. 18

Outline • Introduction • Evolution with eNVM: – On‐chip high speed storage; – Secondary storage; • PCRAM and NAND hybrid SSD; • Revolution with eNVM: – Memristor‐based neuromorphic accelerator. • Conclusion 19

Hybrid SSD • Memory hierarchy On-chip memory Page mode Erase Unit 1~30 cycles ↓ PN=0, V X Random Off-chip memory access PN=1, V 100~300 cycles erase-before- PN=2, V write (EBW) Solid State Disk … (Flash) ↓ 25K~2M cycles PN=n, V In-place- X update (IPU) 20 Courtesy: Al Fazio (Intel)

PRAM (PCM) Cell • One transistor/diode and one GST (GeSbTe). • In‐place updating (IPU) Top Electrode Top Electrode GST GST Amorphous Crystalline Heater Heater Low resistance: ‘1’ High resistance: ‘0’ Bottom Bottom Electrode Electrode +N +N Substrate Substrate 21

Hybrid SSD • Conventional SSD: FLASH. • Promising candidate: PRAM (Phase change). • To combine benefits of both technologies: – Hybrid SSD. • Two usage: – Performance; – Reliability. 22

Hybrid SSD: performance enhancement Erase Unit 1 Erase Unit 2 Erase Unit 3 PN=0, V PN=0, V PN=1, V PN=1, V PN=2, V PN=2, V PN=2, V PN=2, I … … … PN=n, V PN=n, I PN=n, V PN=n, V Merge Operation (Empty Pages) (time consuming) PN=Page Number; V=Valid; I=Invalid G.Sun, Y. Joo, Y. Chen, Y. Xie, Y.Chen , H. Li, 23 HPCA, 2010. Erase Unit = 128/ 256KB, Page = 512Bytes ~ 8KB

Hybrid SSD: performance enhancement Erase Unit Data In-place Region updating … … NAND flash Log Region … … Data Buffer PRAM in Sector (512Bytes) Memory Physical View Structural View 24 Hybrid Architecture

Different Log Assignments Erase Unit Erase Unit Data Data Region Region Fixed Organize Assignment log pages in group Log Log Region Region Erase Unit Static log assignment Data Region Group log assignment Dynamic log assignment Dynamic Assignment Log Region 25

Hybrid SSD: performance enhancement 26

Outline • Introduction • Evolution with eNVM: – On‐chip high speed storage; – Secondary storage; • Revolution with eNVM: – Memristor‐based neuromorphic accelerator. • Conclusion 27

Computing: Present and Future 10000 Power Density (mW/mm 2 ) Clock Frequency (MHz) 1000 1000 Multi‐core Nuclear Reactor 100 100 Rocket Launch Neural Hot Plate Network 1990 2000 2010 1990 2000 2010 New Trend: Multi‐core, advanced power management, large on‐chip storage. - Future: Heterogeneous system, Brain‐like computing. - Source: CPU DB, Intel 28

Brain – The Most Efficient Computing Machine Neocortex Gray matter 6 layers White matter Signals travel within and Brain: between 15–30B neurons layers Extremely complex organ 4km/mm 3 35w Neuron: Process signals from other neurons. Synapse: Memory Weight signals Neural Network 29

Emerging NVM Enabled Storage Architecture: From Evolution to - PowerPoint PPT Presentation

Emerging NVM Enabled Storage Architecture: From Evolution to Revolution. Yiran Chen Electrical and Computer Engineering University of Pittsburgh Sponsors: NSF, DARPA, AFRL, and HP Labs 1 Outline Introduction Evolution with eNVM:

YMMV Ov Overv erview iew In Inte tel NV l NVM M Em Emul ulat ator or

Rethinking Applications in the NVM Era Amitabha Roy ex- Intel Research NVM = Non Volatile

Designing a User-Friendly Java NVM Framework Thomas Shull , Jian Huang, Josep Torrellas University

YMMV 2013 2013 2013 2013 Prison Life GOOD EVIL NVM OLTP DRAM SSD/HDD Pr Projec oject

YMMV The The Las Last Si t Six Mon x Months ths Prison Life GOOD EVIL NVM OLTP DRAM

System Security Overview with an Emphasis on Security Issues for Storage and Emerging NVM (Part

System Security Overview with an Emphasis on Security Issues for Storage and Emerging NVM (Part

SOFTWARE ARCHITECTURE SOFTWARE ARCHITECTURE OF AI-ENABLED SYSTEMS OF AI-ENABLED SYSTEMS

SOFTWARE ARCHITECTURE SOFTWARE ARCHITECTURE OF AI-ENABLED SYSTEMS OF AI-ENABLED SYSTEMS

Reducing Write Amplification of Flash Storage through Cooperative Data Management with NVM 32nd

Journaling on NVM Cheng Chen, Jun Yang , Qingsong Wei, Chundong Wang, and Mingdi Xue Data Storage

> SUN STORAGE 7000 UNIFIED STORAGE SYSTEMS ITS TIME TO CHANGE YOUR STORAGE

Cheaper, Faster Computing with hardware accelerators and NVM storage Sang-Woo Jun Assistant

SUSE Enterprise Storage 6 Darren Soothill EMEA Storage Technical Strategist Agenda

Lecture 4: Storage Management 1 / 57 Storage Management Administrivia Assignment 1 is due on

OSG STORAGE OVERVIEW Tanya Levshina Talk Outline 2 OSG Storage architecture OSG Storage

SUCCESSFUL LONG-TERM INVESTING WITH HIGH-QUALITY STOCKS FEBRUARY 15, 2020 Susan Christ, CFA

Fundamentals of Computational Neuroscience 2e December 31, 2009 Chapter 8: Recurrent associative

TOPICS IN ETHICS AND IMMIGRATION LAW KURSTEN PHELPS Tahirih Justice Center Director of Legal

Learning Hierarchical Information Flow with Recurrent Neural Modules Danijar Hafner 1 , Alex Irpan

NEURAL NETWORK FOR OBJECT RECOGNITION Ming Lang and Xialoin Hu May 3, 2016 Presenter: Ceren

@ .To increase understanding of the The difference between "conscious/explicit behavior"

From the National Coalition for Alarm Management Safety A case study from Sentara Healthcare

FINDING THE EVIDENCE Providence Health Care Research Challenge Research Skills and Resources

Emerging NVM Enabled Storage Architecture: From Evolution to - PowerPoint PPT Presentation

Emerging NVM Enabled Storage Architecture: From Evolution to Revolution. Yiran Chen Electrical and Computer Engineering University of Pittsburgh Sponsors: NSF, DARPA, AFRL, and HP Labs 1 Outline Introduction Evolution with eNVM:

YMMV Ov Overv erview iew In Inte tel NV l NVM M Em Emul ulat ator or

Rethinking Applications in the NVM Era Amitabha Roy ex- Intel Research NVM = Non Volatile

Designing a User-Friendly Java NVM Framework Thomas Shull , Jian Huang, Josep Torrellas University

YMMV 2013 2013 2013 2013 Prison Life GOOD EVIL NVM OLTP DRAM SSD/HDD Pr Projec oject

YMMV The The Las Last Si t Six Mon x Months ths Prison Life GOOD EVIL NVM OLTP DRAM

System Security Overview with an Emphasis on Security Issues for Storage and Emerging NVM (Part

System Security Overview with an Emphasis on Security Issues for Storage and Emerging NVM (Part

SOFTWARE ARCHITECTURE SOFTWARE ARCHITECTURE OF AI-ENABLED SYSTEMS OF AI-ENABLED SYSTEMS

SOFTWARE ARCHITECTURE SOFTWARE ARCHITECTURE OF AI-ENABLED SYSTEMS OF AI-ENABLED SYSTEMS

Reducing Write Amplification of Flash Storage through Cooperative Data Management with NVM 32nd

Journaling on NVM Cheng Chen, Jun Yang , Qingsong Wei, Chundong Wang, and Mingdi Xue Data Storage

&gt; SUN STORAGE 7000 UNIFIED STORAGE SYSTEMS ITS TIME TO CHANGE YOUR STORAGE

Cheaper, Faster Computing with hardware accelerators and NVM storage Sang-Woo Jun Assistant

SUSE Enterprise Storage 6 Darren Soothill EMEA Storage Technical Strategist Agenda

Lecture 4: Storage Management 1 / 57 Storage Management Administrivia Assignment 1 is due on

OSG STORAGE OVERVIEW Tanya Levshina Talk Outline 2 OSG Storage architecture OSG Storage

SUCCESSFUL LONG-TERM INVESTING WITH HIGH-QUALITY STOCKS FEBRUARY 15, 2020 Susan Christ, CFA

Fundamentals of Computational Neuroscience 2e December 31, 2009 Chapter 8: Recurrent associative

TOPICS IN ETHICS AND IMMIGRATION LAW KURSTEN PHELPS Tahirih Justice Center Director of Legal

Learning Hierarchical Information Flow with Recurrent Neural Modules Danijar Hafner 1 , Alex Irpan

NEURAL NETWORK FOR OBJECT RECOGNITION Ming Lang and Xialoin Hu May 3, 2016 Presenter: Ceren

@ .To increase understanding of the The difference between &quot;conscious/explicit behavior&quot;

From the National Coalition for Alarm Management Safety A case study from Sentara Healthcare

FINDING THE EVIDENCE Providence Health Care Research Challenge Research Skills and Resources

> SUN STORAGE 7000 UNIFIED STORAGE SYSTEMS ITS TIME TO CHANGE YOUR STORAGE

@ .To increase understanding of the The difference between "conscious/explicit behavior"