Gated Precharging: Reducing Bitline Precharge in Deep-Sub µ Caches Se-Hyun Yang and Babak Falsafi PowerTap PowerTap http://www.ece.cmu.edu/~powertap Computer Architecture Lab (CALCM) Carnegie Mellon University
High Bitline Leakage in Caches Deep-sub µ high-performance caches BL BL BL BL � Use subarrays … � Precharge entire cache : � Active subarrays: bitlines discharge … � Idle subarrays: bitlines leak … Energy wasted in idle subarrays !
Exploit Temporal Locality in Subarrays Observation � All subarrays precharge/leak � But, only small # of active subarrays Precharge Precharge Precharge Hot Precharge Precharge Precharge Precharge Hot
Contribution: Gated Precharging Precharge only active subarrays Detect temporal locality � Decay counters � Threshold comparison logic Reduce precharging � by 89% in L1 d-cache � by 92% in L1 i-cache � with < 2% performance impact
Outline � Overview � Bitline Leakage � Gated Precharging � Temporal Locality in Subarrays � Implementation � Gating Overhead � Related Work � Results � Conclusion
Bitline Leakage in SRAM Cells BL BL Wordline More than 60% discharge in 0.10 µ
How Much Temporal Locality? We evaluate, in a small window How many accesses reuse subarrays? 1. How many active subarrays? 2.
Subarray Reuse Ratio Even in a small window, high subarray reuse e.g., gcc with 32K L1D with 1K subarrays � 96% accesses reuse subarrays in 100-cycle window of d-cache accesses Cummulative fraction For all benchmarks, 100% 80% in 100-cycle window 60% � 95% for d-cache 40% � 98% for i-cache 20% 0% 1 10 100 1000 10000 100000 Subarray access interval (cycles)
Fraction of Subarrays Accessed In a small window, small # of active subarrays e.g., gcc with 32K L1D with 1K subarrays � 19% of subarrays accessed in 100-cycle window Fraction of subarrays touched in a window For all benchmarks, 100% 80% in 100-cycle window 60% � < 29% for d-cache 40% � < 22% for i-cache 20% 0% 1 10 100 1000 10000 100000 Window size (cycles)
Temporal Locality in Subarrays In 100-cycle window, � >95% of cache accesses reuse < 30% of subarrays Most accesses temporally localized in small # of subarrays
Gated Precharging: Hardware Decay counter per subarray [Kaxiras, et al.] Threshold value to decide “when” to precharge Algorithm Precharge Threshold � if count < threshold Comp Control � precharge CLK Counter reset � if count > threshold Subarray Cache Access � no precharge
Gated Precharging: Overhead Minimal performance overhead � Hits on idle subarrays incur 1 extra cycle � Infrequent due to temporal locality (Example: gcc < 8% d-cache accesses) Minimal energy overhead � 10-bit counter per subarray � Comparison logic � Existing precharge control logic
Related Work Delayed precharging [Alpha 21264] � Precharge only required subarrays � Increase cache access latency by delaying precharge Resizable caches [Albonesi] [Yang, et al.] � Capture working set size variation & resize caches � Coarse switching granularity (time & space) � Relatively larger performance overhead Way prediction [Powell, et al., Inoue, et al.] � Predict set associative way for next access � Orthogonal to gated precharging
Methodology � Wattch [ISCA2000] � 16 SPEC2000/Olden benchmarks � Performance impact < 2% � Base Case � 8-wide issue, 128-entry RUU � 32K direct-mapped L1 I & D w/ 1K-subarray � 512K 4-way unified L2 � Determine threshold values based on profiling � Threshold values ≅ 100 cycles
� by >85% for all but vpr � On average by 89% Reduced Fraction of subarray precharge 10% 20% 30% 40% 50% 0% a m m p a r t Results: D-Cache b h b i s o r t b z i p 2 e m 3 d e q u a k e g c c h e a l t h m c f m e s a t r e e a d d t s p v o r t e x v p r w u p w i s e
� by >90% for 13 benchmarks � On average by 92% Reduced Fraction of subarray precharges 10% 20% 30% 40% 50% 0% a m m p a r t b Results: I-Cache h b i s o r t b z i p 2 e m 3 d e q u a k e g c c h e a l t h m c f m e s a t r e e a d d t s p v o r t e x v p r w u p w i s e
Conclusions High bitline leakage in deep submicron caches Energy wasted in idle subarrays Gated precharging � Exploits temporal locality in subarrays � Reduces 90% of precharging � With < 2% performance impact
For more information PowerTap Project http://www.ece.cmu.edu/~powertap Computer Architecture Lab Carnegie Mellon University
Recommend
More recommend