split array caches for
play

SPLIT ARRAY CACHES FOR EMBEDDED APPLICATIONS Euromicro DSD 2010 - PowerPoint PPT Presentation

A. M. Tokarnia, M. Tachibana Schoool of Electrical and Computer Engineering, UNICAMP DESIGN OF TRACE-BASED SPLIT ARRAY CACHES FOR EMBEDDED APPLICATIONS Euromicro DSD 2010 Alice M. Tokarnia, Marina Tachibana Introduction Split Array Caches


  1. A. M. Tokarnia, M. Tachibana Schoool of Electrical and Computer Engineering, UNICAMP DESIGN OF TRACE-BASED SPLIT ARRAY CACHES FOR EMBEDDED APPLICATIONS Euromicro DSD 2010 Alice M. Tokarnia, Marina Tachibana

  2. Introduction Split Array Caches  On-chip caches are one of the ideal targets of design optimization  Role in performance and power consumption.  Caches can be customized.  Core-based processors, ASIPs, configurable processors  Arrays  Vectors, arrays, data structures.  Elements stored at sequential addresses.  Array caches  Split array caches  Defined by partition organization and array-partition mapping.  Arrays with distinct locality properties can be mapped to partitions with different organizations  Parallel accesses to the partitions may further improve performance. A. M. Tokarnia, M. Tachibana Euromicro DSD 2010 Schoool of Electrical and Computer Engineering, UNICAMP

  3. Introduction Related Works  Tuning a cache to an application to improve system performance or power consumption  Givargis et al. [ICCAD 99]  Vahid et al. [DATE 04]  Ghosh and Givargis [DATE 03]  Gordon-Ross et al. [Ultra Low-Power Electronics and Design 04]  Mapping application variables to embedded cache ‘ parts ’ according to their locality  Panda et al [DATE 98]  Sanchez et al.[ IEEE TCCA Newsletters 97] and Gonzalez et al. [IEEE Micro 00]  Lee et al. [ETRI Journal 03]  Naz et al. [ACM SIGARCH 06] A. M. Tokarnia, M. Tachibana Euromicro DSD 2010 Schoool of Electrical and Computer Engineering, UNICAMP

  4. Introduction TSAC-EDPs  Trace-based split array caches  Organization  Two-partition array cache  The line size of one partition is 2x that of the other  The ways of the partitions have the same size  Array-partition mapping and partition degree of set- associativity  Determined as to minimize the average memory access EDP A. M. Tokarnia, M. Tachibana Euromicro DSD 2010 Schoool of Electrical and Computer Engineering, UNICAMP

  5. Design Method  Inputs  Application program, typical inputs, cache size constraint  Outputs  TSAC-EDPs and best unified caches  Main concern  Navigate through a large design space  Array-partition mapping  Degree of set-associativity  Strategy  Unified caches whose ways are split between two partitions A. M. Tokarnia, M. Tachibana Euromicro DSD 2010 Schoool of Electrical and Computer Engineering, UNICAMP

  6. Design Method Design Steps Trace generation 1. Definition of unified caches 2. Trace analysis 3. Candidate split caches 4. Cache simulation 5. Cache evaluation and selection 6. A. M. Tokarnia, M. Tachibana Euromicro DSD 2010 Schoool of Electrical and Computer Engineering, UNICAMP

  7. Design Method Trace Generation & Definition of Unified Caches  Instrument and execute code to generate a trace of array accesses  For an access to an array, the trace has an entry with array name, address, and number of words.  Unified caches C 0 ( b , n , m ) satisfying the size constraint  line size b , degree of set-associativity n , number of sets m A. M. Tokarnia, M. Tachibana Euromicro DSD 2010 Schoool of Electrical and Computer Engineering, UNICAMP

  8. Design Method Trace Analysis  Concentration of line L i of array A l X ( A , L i , b )  Fraction of accesses that falls to same half of L i as the previous access 1. L 1 1. L 2 2. L 1 2. L 2 3. L 1 3. L 2 4. L 1 4. L 2 Exemple 1 Exemplo 2 l X ( A , L 1 , 8 ) = 4/4 = 1 l X ( A , L 1 , 8 ) = 1/4 = 0.25 A. M. Tokarnia, M. Tachibana Euromicro DSD 2010 Schoool of Electrical and Computer Engineering, UNICAMP

  9. Design Method Trace Analysis  X ( A , b ): concentration of A  median concentration of the lines accessed.  N ( A , b ): number of distinct lines of A accessed.  N t ( b ): number of distinct lines accessed for all arrays. Concentration defines a partial ordering of arrays A. M. Tokarnia, M. Tachibana Euromicro DSD 2010 Schoool of Electrical and Computer Engineering, UNICAMP

  10. Design Method Candidate Split Caches  Array-partition mapping  Arrays in which accesses exhibit lower concentration → C 1 ( b , n 1 , m )  Other arrays → C 2 ( b/2 , n 1 , 2 m )  At most, (#arrays -1) array-partition mappings.  2 #arrays array-partition mappings for exhaustive search A. M. Tokarnia, M. Tachibana Euromicro DSD 2010 Schoool of Electrical and Computer Engineering, UNICAMP

  11. Design Method Candidate Split Caches  Degree of set-associativity of the partitions  Number of lines is approximantely proportional to number of distinct lines accessed         ( , ) / ( ) 0 . 5 n n N A b N t b   2   A mapped   to C 2   n n n 1 2 A. M. Tokarnia, M. Tachibana Euromicro DSD 2010 Schoool of Electrical and Computer Engineering, UNICAMP

  12. Design Method Examples of Cache Spliting C 0 = { X , Y , Z , A , B } b b b b m … 1 2 n - 1 n ( C 2 | C 1 ) n ( C 2 | C 1 ) 1 ( C 2 | C 1 ) 2 C 2 = { X , Y , Z , A } C 1 = { B } C 2 = { X , Y , Z } C 1 = { A , B } C 2 = { X } C 1 = { A , B , Y , Z } b b b/ 2 b b b b/ 2 b/ 2 b/ 2 … m … … 2 m m 2 m m … 2 m 1 n - 1 1 n - 1 1 1 n - 1 1 1 A. M. Tokarnia, M. Tachibana Euromicro DSD 2010 Schoool of Electrical and Computer Engineering, UNICAMP

  13. Design Method Simulation of Unified and Split Caches  Obtain metrics used for cache evaluation  miss rates  Other metrics can be obtained from cache models and memory datasheets  Hit access time and energy consumption  Miss (time) penalty and miss (energy) penalty A. M. Tokarnia, M. Tachibana Euromicro DSD 2010 Schoool of Electrical and Computer Engineering, UNICAMP

  14. Design Method Cache Evaluation C 0  Fraction F of accesses consists of two parallel accesses, one to C 1 and the other to C 2    _ _ _ Average memory access time C 0     Φ    1 _ Hit time C 0      _ _ Miss rate C Miss penalty C 0 0 A. M. Tokarnia, M. Tachibana Euromicro DSD 2010 Schoool of Electrical and Computer Engineering, UNICAMP

  15. Design Method Cache Evaluation C 1 | C 2    _ _ _ Average memory access time C i         _ _ _ Hit time C Miss rate C Miss penalty C i i i    _ _ _ | Average memory access time C C 1 2    _ _ _ f Average memory access time C 1 1     _ _ _ f Average memory access time C 2 2 A. M. Tokarnia, M. Tachibana Euromicro DSD 2010 Schoool of Electrical and Computer Engineering, UNICAMP

  16. Design Method Cache Selection  Unified array caches with minimum EDP  Split caches with minimum EDP: TSAC-EDPs A. M. Tokarnia, M. Tachibana Euromicro DSD 2010 Schoool of Electrical and Computer Engineering, UNICAMP

  17. Experimental Results  Applications:  Convolution (3 arrays)  Fast Fourier Transform (4 arrays)  Group 3 Fax decoder G3fax (4 arrays)  JPEG encoder (DCT, Quantization) (10 arrays)  MPEG-2 video decoder (49 arrays)  Cache sizes: 8K-byte, 12K-byte  Cache access time and energy  CACTI, 90 nm  Memory access time  Samsung DDR266  Memory energy consumption  50x cache energy consumption A. M. Tokarnia, M. Tachibana Euromicro DSD 2010 Schoool of Electrical and Computer Engineering, UNICAMP

  18. Experimental Results TSAC-EDPs Unified-EDP C 0 TSAC-EDP C 1 |C 2 Application ( b, n, m ) ( b 1 , n 1 , m 1 ) | ( b 2 , n 2 , m 2 ) Conv-8Kb (32, 2, 128) (64,1,64) | (32,1,128) FFT-8Kb (32, 2, 128) (16,3,128) | (8,1,256) G3fax-8Kb (16, 2, 256) (16,1,256) | (8,1,512) JPEG-8Kb (16, 2, 256) (32,1,128) | (16,1,256) MPEG-8Kb (16, 2, 256) (16,1,256) | (8,1,512) Conv-12Kb (32, 3, 128) (64,1,64) | (32,2,128) FFT-12Kb (16, 6, 128) (16,2,256) | (8,1,512) G3fax-12Kb (16, 3, 256) (16,1,128) | (8,5,256) JPEG-12Kb (16, 3, 256) (16,2,256) | (8,1,512) MPEG-12Kb (16, 3, 256) (32,1,128) | (16,2,256) A. M. Tokarnia, M. Tachibana Euromicro DSD 2010 Schoool of Electrical and Computer Engineering, UNICAMP

  19. Experimental Results Dif EDP (EDP C 0 -EDP C 1 | C 2 )/EDP C 0 Dif EDP 0% 25% 50% 70% 60% 50% 40% 30% 20% 10% 0% -10% -20% Conv Conv FFT FFT G3fax G3fax JPEG JPEG MPEG MPEG 8 Kb 12 Kb 8 Kb 12 Kb 8 Kb 12 Kb 8 Kb 12 Kb 8 Kb 12 Kb A. M. Tokarnia, M. Tachibana Euromicro DSD 2010 Schoool of Electrical and Computer Engineering, UNICAMP

  20. Experimental Results Dif Energy and Dif AccessTime (E C 0 -E C 1 | C 2 )/E C 0 (D C 0 -D C 1 | C 2 )/D C 0 Dif Energy Dif AccessTime 0% 25% 50% 40% 60% 35% 50% 30% 40% 30% 25% 20% 20% 10% 15% 0% 10% -10% 5% -20% 0% -30% -40% -5% Conv Conv FFT FFT G3fax G3fax JPEG JPEG MPEG MPEG Conv Conv FFT FFT G3fax G3fax JPEG JPEG MPEG MPEG 8 Kb 12 Kb 8 Kb 12 Kb 8 Kb 12 Kb 8 Kb 12 Kb 8 Kb 12 Kb 8 Kb 12 Kb 8 Kb 12 Kb 8 Kb 12 Kb 8 Kb 12 Kb 8 Kb 12 Kb A. M. Tokarnia, M. Tachibana Euromicro DSD 2010 Schoool of Electrical and Computer Engineering, UNICAMP

  21. Conclusion  TSAC-EDPs have better average memory access time, energy and energy-delay product than unified caches of the same size for some applications.  Parallel accesses to cache partitions, when possible, can further improve average memory access EDP , time, and energy.  Concept of array concentration can be applied to other design methods. A. M. Tokarnia, M. Tachibana Euromicro DSD 2010 Schoool of Electrical and Computer Engineering, UNICAMP

Recommend


More recommend