Low Power Cache Design Ching-Long Su and Alvin M Despain from - PDF document

Acknowlegements Low Power Cache Design � Ching-Long Su and Alvin M Despain from University of Southern California,”Cache Design Trade-offs for Power and Performance Optimization:A Case Study” � C.L and Alvin M.Despain “ Cache Designs for Energy and Efficiency” M.Bilal Paracha � Zhichun Zhu Xiadong Zhang, College of William and Mary, “Access Mode predictions for low-power cache Hisham Chowdhury design” � M. D. Powell and A. Agrawal and T. N. Vijaykumar Ali Raza and B. Falsafi and K. Roy, Reducing Set-Associative Cache Energy via selective Direct –Mapping and Way Prediction.”. MICRO 2001. Today’s talk Today’s talk…. Abstract � Conclusion � Introduction � � Acknowledgements Use of cache in microprocessors � Different designs to optimize cache energy and power � consumption Design Trade-offs for Power & Performance Optimization � � Vertical Cache Partitioning � Horizontal Cache Partitioning � Gray Code Addressing Set-Associative Cache Energy Reduction � � Way Prediction � Selective direct-mapping Access Mode Prediction (AMP) � � Advantages over Way Prediction and Phased cache � Different prediction techniques Evaluation Results � � Cache Access Times � Miss Rates � Cache Energy consumption Abstract Introduction � Usage of caches in modern � Cache uses 30-60% processor microprocessors. energy in embedded systems � Caches designed for high � Use of caches in high performance performance, ignore power machines consumption � Various designs to optimize energy � Research activities towards low consumption power cache design 1

Use of cache in microprocessors � High performance products go mobile (Notebooks, PDA’s etc) Designs to optimize cache � Cache’s as temporary storage energy consumption devices � Design of components with low power consumption Vertical Cache Partitioning Horizontal Cache Partitioning � Block Buffer � Block Hit/Miss � Cache segments � Block Size � Cache sub-banks � Reduction cache accesses � Hit time, an advantage Gray Code Addressing Evaluation Results � < dm,2> A direct mapped cache with block size 2 words •Gray code vs 2’s compliment � < dm,4> A direct mapped cache with block size 4 words •Minimizes bit switches � < dm,8> A direct mapped cache with block size 8 words •2s Compliment:31 bits change < 2lru,2> A 2-way set associative cache with block size •Gray Code:16 bits change � 2 words <2lru,4> A 2-way set associative cache with block size 4 � words <2lru,8> A 2-way set associative cache with block size 8 � words <4lru,2> A 4-way set associative cache with block size 2 � words <4lru,4> A 4-way set associative cache with block size 4 � words <4lru,8> A 4-way set associative cache with block size 8 � words 2

Cache Access Time Energy consumption vs Cache Size oTakes less time to access direct –mapped than set associative oCache access of 1K byte for dm=4.79 ns, for set assoc=7.15 ns o2 way set associative is approx 50% slower than dm cache Energy Consumption Reducing Set Associative Cache Energy Via Way Prediction and Selective Direct mapping Cache Access Energy Reduction Different Design Techniques Techniques a) Conventional Parallel Access � Energy Dissipation in Data Array is much larger than in Tag Array so Energy Optimizations in Data Array only are done. � Selective Direct Mapping for D- Caches � Way Prediction for I-Caches 3

b) Sequential Access c) Way Prediction Prediction Framework for Selective d) Selective Direct Mapping (DM) Direct mapping (DM) Different Cache accessing mode Access Mode Prediction for Low � Phased Cache: Compares tag with all the tag in a particular set, If the tag � Power Cache Design matches only then, it accesses the data Consumes energy, not efficient � Access the set ↓ Access all n tags ↓ Access the data corresponding to the tag 4

� Way Prediction: � Access Mode Prediction (AMP) Access only the predicted tag and data Prediction based approach � � Efficient when hit rate is high Better to use Way Prediction when hit rate is very high � � Not very efficient when there is a miss (has to access rest of � When hit rate is low, it is preferable to use Phased � the tag and data elements) Cache approach Access the set Predicts whether cache access will result in a hit or a � ↓ miss. If it predicts a hit then Way prediction is used, Way Prediction other wise use Phased Cache approach ↓ Accuracy of the access mode determines the efficiency � Access the predicted data and tag sub array in the set ↓ of the approach Prediction Correct Yes ↓→ No Compare the rest of the data and tag array Proceed Different Predictors Saturating Counter: � Power Consumption: � Similar to the saturating counter of branch prediction used in project2 � Perfect AMP and perfect Way Prediction has a power � Maintains a two bit counter which increments on a cache hit and decrements on a � consumption which is the lower bound of conventional cache miss set associative cache. Two-level adaptive predictor: � � predicted hit in the way-prediction cache, the Adaptive two level branch prediction using global pattern-history table (GAg) � K bit history register records the result of most recent K accesses � energy consumed is E tag + E data, compared with n × For a hit register records a 1, otherwise 0 � This K bit is used to index global pattern history table which has 2^K entries, each entry is a � E tag+ E data in the phased cache 2 bit saturation counter Per address two level global pattern history table (PAg) � Each set has its own access history register � All history register index a single history pattern table � � miss in the way-prediction cache will consume ( n + 1) × E tag + ( n + 1) × E data, in comparison with ( n +1) × Correlation predictor � E tag + E data in the phased cache. Gshare predictor: � XOR of global access history with current reference set provides the � index for global pattern history table Conclusion Misprediction rate of different predictors � Cache Designs can be modified to obtain maximum performance and optimal energy consumption � Experiments suggest that � direct-mapped caches (inst and data) consume less energy for dynamic logic � Set Associative consume less energy for static logic � Circuit level techniques can no longer keep power dissipation under a reasonable level. � Reduction of power is done on architectural level. By producing different schemes for reducing on- chip cache power consumption 5

Questions…??? 6

Low Power Cache Design Ching-Long Su and Alvin M Despain from - PDF document

Acknowlegements Low Power Cache Design Ching-Long Su and Alvin M Despain from University of Southern California,Cache Design Trade-offs for Power and Performance Optimization:A Case Study C.L and Alvin M.Despain Cache Designs

1 Classifying cache misses Cache Organization Classifying misses by causes (3Cs) Cache size,

What Is Memory Hierarchy A typical memory hierarchy today: Lecture 13: Cache Basics and Cache

Memory Hierarchy: Cache Memory hierarchy Cache basics Locality Cache organization Cache-aware

Web Cache Consistency Web Cache Consistency Web Cache Consistency Web Cache Consistency

L09: Cache Name: ID: Question: Direct Mapping Cache Hit Rate Consider a 4-block empty Cache,

Low Power Microprocessors Low Power Microprocessors Low Power Technology Gao Wei & Tian

Cache Memory Chapter 17 S. Dandamudi Outline Introduction Types of cache misses

Cache Memory Chapter 17 S. Dandamudi Outline Introduction Types of cache misses

Generations of Cache 1980: no cache in proc; 1989 first Intel proc with a cache on chip.

Cache Performance Associativity Replacement Samira Khan Cache Performance March 28,

Caches Electronic Computers M Caches 1 Cache LOCALITY PRINCIPLE (SPATIAL AND TEMPORAL)

Plan Hierarchical memories and their impact on our programs 1 Cache Memories, Cache Complexity

CSE378 - Cache Performance metrics for caches Parameters for cache design Basic performance

Cache Creek Placer Area Fee Proposal History of Placer Mining at Cache Creek Prospecting in

Cache Memories, Cache Complexity Marc Moreno Maza University of Western Ontario, London, Ontario

General Cache Mechanics CPU Block: unit of data in cache and memory. (a.k.a. line) Memory

S RIT-TPC experiments at RIKEN 2016 Mizuki Kurata-Nishimura For S RIT-TPC collaboration

Cache Replacement Championship The 3P and 4P cache replacement policies Pierre Michaud INRIA

Background Database as a service (DaaS) User Service Provider Service Level Database

6 th international Parallel Tools Workshop Cray Performance Measurement and Analysis Tools

Detection efficiency measurement of the trigger counters for MuSIC beam tests Izyan Hazwani

Multiprocessor Synchronization Multiprocessor Systems Memory Consistency

The Political Spectrum and Voter Options in Weimar Germany The Election of 1932 Juanita Ray--NC

Youth resistance to the Nazis Learning Objectives To understand how the Nazi regime wanted to

Low Power Cache Design Ching-Long Su and Alvin M Despain from - PDF document

Acknowlegements Low Power Cache Design Ching-Long Su and Alvin M Despain from University of Southern California,Cache Design Trade-offs for Power and Performance Optimization:A Case Study C.L and Alvin M.Despain Cache Designs

1 Classifying cache misses Cache Organization Classifying misses by causes (3Cs) Cache size,

What Is Memory Hierarchy A typical memory hierarchy today: Lecture 13: Cache Basics and Cache

Memory Hierarchy: Cache Memory hierarchy Cache basics Locality Cache organization Cache-aware

Web Cache Consistency Web Cache Consistency Web Cache Consistency Web Cache Consistency

L09: Cache Name: ID: Question: Direct Mapping Cache Hit Rate Consider a 4-block empty Cache,

Low Power Microprocessors Low Power Microprocessors Low Power Technology Gao Wei &amp; Tian

Cache Memory Chapter 17 S. Dandamudi Outline Introduction Types of cache misses

Cache Memory Chapter 17 S. Dandamudi Outline Introduction Types of cache misses

Generations of Cache 1980: no cache in proc; 1989 first Intel proc with a cache on chip.

Cache Performance Associativity Replacement Samira Khan Cache Performance March 28,

Caches Electronic Computers M Caches 1 Cache LOCALITY PRINCIPLE (SPATIAL AND TEMPORAL)

Plan Hierarchical memories and their impact on our programs 1 Cache Memories, Cache Complexity

CSE378 - Cache Performance metrics for caches Parameters for cache design Basic performance

Cache Creek Placer Area Fee Proposal History of Placer Mining at Cache Creek Prospecting in

Cache Memories, Cache Complexity Marc Moreno Maza University of Western Ontario, London, Ontario

General Cache Mechanics CPU Block: unit of data in cache and memory. (a.k.a. line) Memory

S RIT-TPC experiments at RIKEN 2016 Mizuki Kurata-Nishimura For S RIT-TPC collaboration

Cache Replacement Championship The 3P and 4P cache replacement policies Pierre Michaud INRIA

Background Database as a service (DaaS) User Service Provider Service Level Database

6 th international Parallel Tools Workshop Cray Performance Measurement and Analysis Tools

Detection efficiency measurement of the trigger counters for MuSIC beam tests Izyan Hazwani

Multiprocessor Synchronization Multiprocessor Systems Memory Consistency

The Political Spectrum and Voter Options in Weimar Germany The Election of 1932 Juanita Ray--NC

Youth resistance to the Nazis Learning Objectives To understand how the Nazi regime wanted to

Low Power Microprocessors Low Power Microprocessors Low Power Technology Gao Wei & Tian