Acknowlegements Low Power Cache Design � Ching-Long Su and Alvin M Despain from University of Southern California,”Cache Design Trade-offs for Power and Performance Optimization:A Case Study” � C.L and Alvin M.Despain “ Cache Designs for Energy and Efficiency” M.Bilal Paracha � Zhichun Zhu Xiadong Zhang, College of William and Mary, “Access Mode predictions for low-power cache Hisham Chowdhury design” � M. D. Powell and A. Agrawal and T. N. Vijaykumar Ali Raza and B. Falsafi and K. Roy, Reducing Set-Associative Cache Energy via selective Direct –Mapping and Way Prediction.”. MICRO 2001. Today’s talk Today’s talk…. Abstract � Conclusion � Introduction � � Acknowledgements Use of cache in microprocessors � Different designs to optimize cache energy and power � consumption Design Trade-offs for Power & Performance Optimization � � Vertical Cache Partitioning � Horizontal Cache Partitioning � Gray Code Addressing Set-Associative Cache Energy Reduction � � Way Prediction � Selective direct-mapping Access Mode Prediction (AMP) � � Advantages over Way Prediction and Phased cache � Different prediction techniques Evaluation Results � � Cache Access Times � Miss Rates � Cache Energy consumption Abstract Introduction � Usage of caches in modern � Cache uses 30-60% processor microprocessors. energy in embedded systems � Caches designed for high � Use of caches in high performance performance, ignore power machines consumption � Various designs to optimize energy � Research activities towards low consumption power cache design 1
Use of cache in microprocessors � High performance products go mobile (Notebooks, PDA’s etc) Designs to optimize cache � Cache’s as temporary storage energy consumption devices � Design of components with low power consumption Vertical Cache Partitioning Horizontal Cache Partitioning � Block Buffer � Block Hit/Miss � Cache segments � Block Size � Cache sub-banks � Reduction cache accesses � Hit time, an advantage Gray Code Addressing Evaluation Results � < dm,2> A direct mapped cache with block size 2 words •Gray code vs 2’s compliment � < dm,4> A direct mapped cache with block size 4 words •Minimizes bit switches � < dm,8> A direct mapped cache with block size 8 words •2s Compliment:31 bits change < 2lru,2> A 2-way set associative cache with block size •Gray Code:16 bits change � 2 words <2lru,4> A 2-way set associative cache with block size 4 � words <2lru,8> A 2-way set associative cache with block size 8 � words <4lru,2> A 4-way set associative cache with block size 2 � words <4lru,4> A 4-way set associative cache with block size 4 � words <4lru,8> A 4-way set associative cache with block size 8 � words 2
Cache Access Time Energy consumption vs Cache Size oTakes less time to access direct –mapped than set associative oCache access of 1K byte for dm=4.79 ns, for set assoc=7.15 ns o2 way set associative is approx 50% slower than dm cache Energy Consumption Reducing Set Associative Cache Energy Via Way Prediction and Selective Direct mapping Cache Access Energy Reduction Different Design Techniques Techniques a) Conventional Parallel Access � Energy Dissipation in Data Array is much larger than in Tag Array so Energy Optimizations in Data Array only are done. � Selective Direct Mapping for D- Caches � Way Prediction for I-Caches 3
b) Sequential Access c) Way Prediction Prediction Framework for Selective d) Selective Direct Mapping (DM) Direct mapping (DM) Different Cache accessing mode Access Mode Prediction for Low � Phased Cache: Compares tag with all the tag in a particular set, If the tag � Power Cache Design matches only then, it accesses the data Consumes energy, not efficient � Access the set ↓ Access all n tags ↓ Access the data corresponding to the tag 4
� Way Prediction: � Access Mode Prediction (AMP) Access only the predicted tag and data Prediction based approach � � Efficient when hit rate is high Better to use Way Prediction when hit rate is very high � � Not very efficient when there is a miss (has to access rest of � When hit rate is low, it is preferable to use Phased � the tag and data elements) Cache approach Access the set Predicts whether cache access will result in a hit or a � ↓ miss. If it predicts a hit then Way prediction is used, Way Prediction other wise use Phased Cache approach ↓ Accuracy of the access mode determines the efficiency � Access the predicted data and tag sub array in the set ↓ of the approach Prediction Correct Yes ↓→ No Compare the rest of the data and tag array Proceed Different Predictors Saturating Counter: � Power Consumption: � Similar to the saturating counter of branch prediction used in project2 � Perfect AMP and perfect Way Prediction has a power � Maintains a two bit counter which increments on a cache hit and decrements on a � consumption which is the lower bound of conventional cache miss set associative cache. Two-level adaptive predictor: � � predicted hit in the way-prediction cache, the Adaptive two level branch prediction using global pattern-history table (GAg) � K bit history register records the result of most recent K accesses � energy consumed is E tag + E data, compared with n × For a hit register records a 1, otherwise 0 � This K bit is used to index global pattern history table which has 2^K entries, each entry is a � E tag+ E data in the phased cache 2 bit saturation counter Per address two level global pattern history table (PAg) � Each set has its own access history register � All history register index a single history pattern table � � miss in the way-prediction cache will consume ( n + 1) × E tag + ( n + 1) × E data, in comparison with ( n +1) × Correlation predictor � E tag + E data in the phased cache. Gshare predictor: � XOR of global access history with current reference set provides the � index for global pattern history table Conclusion Misprediction rate of different predictors � Cache Designs can be modified to obtain maximum performance and optimal energy consumption � Experiments suggest that � direct-mapped caches (inst and data) consume less energy for dynamic logic � Set Associative consume less energy for static logic � Circuit level techniques can no longer keep power dissipation under a reasonable level. � Reduction of power is done on architectural level. By producing different schemes for reducing on- chip cache power consumption 5
Questions…??? 6
Recommend
More recommend