March 12, 2018 Speeding Up Crossbar Resistive Memory by Exploiting In-memory Data Patterns Wen Wen Lei Zhao, Youtao Zhang, Jun Yang
Ex Execut utive Sum ummary • Problems: performance and reliability of write operations • The large sneaky currents and IR drop issues in crossbar ReRAM • Proposed solutions: speeding up RESET operation based on data pattern • Profiling the number of bitline LRS cells by exploiting intrinsic in-memory processing capability of crossbar ReRAM • Data compression and row address dependent layout to reduce bitline LRS cells • Contributions • Correlation between the RESET latency and the number of LRS cells on selected bitlines • A novel profiling technique to dynamically track the bitline data patterns • Results • Performance: 20.5% over baseline, 14.2% over state-of-the-art • Dynamic energy: 15.7% less than baseline, 7.6% less than state-of-the-art 2
Ou Outline • Background and Motivation • Design Details • Low Overhead Runtime Profiling • Reduce Bitline LRS Cells • Evaluation • Methodologies • Experimental Results • Conclusion 3
Ou Outline • Background and Motivation • Design Details • Low Overhead Runtime Profiling • Reduce Bitline LRS Cells • Evaluation • Methodologies • Experimental Results • Conclusion 4
Re ReRAM Ce Cell High Resistance State (HRS, Logic “0”) Oxygen Ion Top Metal Layer Oxygen Vacancy Low Resistance State (LRS, Logic “1”) Metal Oxide Bottom Metal Layer SET: “H SE “High” to to “Lo “Low”/R ”/RESET: “Lo “Low” to to “H “High” Re ReRAM RAM ce cell struct cture and two resistance ce states 5
ReRAM Crossbar Re Wordline ReRAM Cell e n Sourceline i l t i B Bitline “0T1R” Structure Crossbar ReRAM Cell Wordline Structures Wordline Diode “1T1R” Structure e ReRAM Cell n i l t i B “1D1R” Structure ReRAM Re RAM array struct ctures ü Smallest 4F 2 planar cell size , low fab cost and better scalability . ✘ Sneak currents and IR drop 6
Sn Sneak Cu Curr rrents in in Cr Cros ossbar Re ReRAM Half-selected cells Selected cell V 1/2V s l l e c 1/2V d e t c 1/2V e l e s - f l a H 1/2V 1/2V 1/2V 0 Sneak Current • Diode selectors help but cannot eliminate sneak currents • Sneaky currents lead to serious IR drop issue • Hurt energy efficiency, performance and write reliability • The Slower RESET operation is the performance bottleneck 7 • SET takes shorter time than RESET [ Xu et al ’HPCA15, Zhang et al ’DATE16 ]
How do Ho does es IR IR dr drop af affect RE RESET la laten ency? ! × # $% & = ( • RESET latency is highly sensitive, i.e., exponentially inverse correlation, to voltage drop • t:RESET switching time; V d :voltage drop • C and k are experimental fittings constants • A voltage drop of 0.4V results in 10x RESET latency increase [Govoreanu et al ’IEDM11] 8
Intuitive Th Thoughts 1.During RESET, half-selected cells exhibit as resistive devices . 2.With same voltage stress, a half-selected cell in LRS would have Facts larger sneak current than the one in HRS. 3.RESET operations conservatively use the worst-case access latency of all cells in ReRAM arrays. 1.The number of half-selected cells in LRS affects RESET latency. Observations 2.Dynamically profile and track runtime data patterns may avoid using worst-case RESET latency for all cells. 1.Explore correlation between RESET latency and the number of bitline LRS cells. TODO List 2.Need a runtime profiler to dynamically track bitline data patterns. 3.Need to reduce the number of LRS cells on bitlines. 9
RE RESET la laten ency vs vs. # of of LR LRS ce cells 250 3 RESET Latency Voltage RESET Latency (ns) 2.95 Voltage Drop (V) 200 2.9 150 2.85 100 2.8 50 2.75 0 2.7 100 87.5 75 62.5 50 37.5 25 12.5 0 Bitline LRS Cell Percentage • More LRS cells there are in the bitline, the larger IR drop the sneak current brings, and the longer time the RESET operation takes. 10
RE RESET la laten ency vs vs. # of of LR LRS ce cells • This impact diminishes as the row becomes closer to the write driver. 11
Ou Outline • Background and Motivation • Design Details • Low Overhead Runtime Profiling • Reduce Bitline LRS Cells • Evaluation • Methodologies • Experimental Results • Conclusion 12
Low O Lo Overh rhead R Runtime me P Prof ofiling(1) 1) Mat0 Mat1 Mat63 bitline-sharing-set a0 a0 a0 Wordline Decoders Wordline Decoders Wordline Decoders a1 a1 a1 Shared ADC & Comparator Shared ADC & Comparator Shared ADC & Comparator Profiling Profiling Profiling 3-bit 3-bit 3-bit 64B cacheline: a0 Update W-Flag = MAX{64 3-bit values} 64B cacheline: a1 • The worst-case bitline data pattern within one bitline-sharing-set determines 13 the optimal RESET latency
Lo Low O Overh rhead R Runtime me P Prof ofiling(1) 1) Dot-Product Operation Vread Mat0 Mat1 Mat63 bitline-sharing-set Vread Ron I1=Vread/Ron a0 a0 a0 Vread Wordline Decoders Wordline Decoders Wordline Decoders a1 a1 a1 Roff I2=Vread/Roff Transmission From Gate Mux I=I1+I2 other S/H S/H S/H columns Shared ADC Shared Comparators Shared ADC & Comparator Shared ADC & Comparator Shared ADC & Comparator 3-bit counter value Profiling Profiling Profiling 3-bit 3-bit 3-bit 64B cacheline: a0 Update W-Flag = MAX{64 3-bit values} 64B cacheline: a1 • The worst-case bitline data pattern within one bitline-sharing-set determines 14 the optimal RESET latency
Low O Lo Overh rhead R Runtime me P Prof ofiling(2) 2) Dot-Product Operation Vread 3 2.7122 Ron Vread 2.5 I1=Vread/Ron Current to 3-bit value of LRS percentage Vread 2 Counter = 111 1.9166 Roff Safeguarding area Counter = 110 I/mA 1.5 I2=Vread/Roff 1.5043 Counter = 101 Transmission 1.2345 Counter = 100 From 1 I=I1+I2 Gate Mux Counter = 011 1.0326 0.867 other Counter = 010 S/H S/H S/H 0.704 Counter = 001 0.5 columns 0.507 Counter = 000 Shared ADC 0.089 0 0 12.5 25 37.5 50 62.5 75 87.5 100 Shared Comparators LRS Cell/% 000 001 010 011 100 101 110 111 3-bit counter value • Aggregated currents firstly are converted into digital counters, which represent LRS cell percentages. • W-Flag is updated by comparing all counters in one bit-sharing-set to decide 15 the worst-case bitline.
Row Ad Ro Address Im Impac pact • Until now, we’ve talked about how to profile bitline data patterns, but we have not exploit the impact of row address on RESET latency yet! • The rows with different addresses are mapped to 8 groups with different worst-case RESET latencies. Row Address Group #0 : 0-63 Wordline Decoders Row Address Group RESET #1 : 64-127 latency Decrease Row Address Group #7 : 448-511 Write Drivers & SA 16
A A Summary for Profiling Technique • Finding out the worst-case bitline: 3-bit W-Flag • Recording the percentage of LRS cells in worst-case bitline • Periodically detecting in each mat • Tracking the worst-case: 6-bit W-Cnt • W-Cnt is cleared when W-Flag is updated • Increment the counter of W-Cnt for each write • W-Cnt overflow triggers increment of the W-Flag • RESET latency optimization • W-Flag, W-Cnt and row address are used to determine tWR for RESET 17
Deter ermine ne RE RESET Ti Timing (n (ns) s) • RESET latency depends on bitline data patterns (W-Flag, W-Cnt) and row address W-Fl Flag, W-Cn Cnt Ro Row address Row Address Group LRS Cell Ratio 0 1 2 3 4 5 6 7 111 202.4 197.7 184.9 165.9 142.3 117.2 92.4 69.1 110 202.4 197.7 184.9 165.9 142.3 117.2 92.4 69.1 101 199 194 181.8 162.9 139.8 115 90.5 68 100 189 184.3 172.6 154.8 132.9 109 85.8 65.5 011 173.8 169.7 158.5 142 121.9 99.8 80.2 63.4 010 154.6 150.9 140.9 126 107.9 90.3 74.7 60.9 001 132.9 129.3 120.9 107.9 93.9 81.3 69.2 58.8 000 109.7 106.9 99.7 90.8 81.8 73.2 64.5 56.4 • RESET latency conservatively uses the upper limit number, which is the worst- case in next LRS cell ratio range 18
Deter ermine ne RE RESET Ti Timing (n (ns) s) • RESET latency depends on bitline data patterns (W-Flag, W-Cnt) and row address W-Fl Flag, W-Cn Cnt Ro Row address Row Address Group LRS Cell Ratio 0 1 2 3 4 5 6 7 111 202.4 197.7 184.9 165.9 142.3 117.2 92.4 69.1 110 202.4 197.7 184.9 165.9 142.3 117.2 92.4 69.1 101 199 194 181.8 162.9 139.8 115 90.5 68 100 189 184.3 172.6 154.8 132.9 109 85.8 65.5 LRS: 62. LR 62.5% 5%~75% 75%, but but no no ex exceeding 87. 87.5% 5% 011 173.8 169.7 158.5 142 121.9 99.8 80.2 63.4 before W-Cn be Cnt ov overflow ows 010 154.6 150.9 140.9 126 107.9 90.3 74.7 60.9 001 132.9 129.3 120.9 107.9 93.9 81.3 69.2 58.8 000 109.7 106.9 99.7 90.8 81.8 73.2 64.5 56.4 • RESET latency conservatively uses the upper limit number, which is the worst- case in next LRS cell ratio range 19
Ou Outline • Background and Motivation • Design Details • Low Overhead Runtime Profiling • Reduce Bitline LRS Cells • Evaluation • Methodologies • Experimental Results • Conclusion 20
Recommend
More recommend