A. M. Tokarnia, M. Tachibana Schoool of Electrical and Computer Engineering, UNICAMP DESIGN OF TRACE-BASED SPLIT ARRAY CACHES FOR EMBEDDED APPLICATIONS Euromicro DSD 2010 Alice M. Tokarnia, Marina Tachibana
Introduction Split Array Caches On-chip caches are one of the ideal targets of design optimization Role in performance and power consumption. Caches can be customized. Core-based processors, ASIPs, configurable processors Arrays Vectors, arrays, data structures. Elements stored at sequential addresses. Array caches Split array caches Defined by partition organization and array-partition mapping. Arrays with distinct locality properties can be mapped to partitions with different organizations Parallel accesses to the partitions may further improve performance. A. M. Tokarnia, M. Tachibana Euromicro DSD 2010 Schoool of Electrical and Computer Engineering, UNICAMP
Introduction Related Works Tuning a cache to an application to improve system performance or power consumption Givargis et al. [ICCAD 99] Vahid et al. [DATE 04] Ghosh and Givargis [DATE 03] Gordon-Ross et al. [Ultra Low-Power Electronics and Design 04] Mapping application variables to embedded cache ‘ parts ’ according to their locality Panda et al [DATE 98] Sanchez et al.[ IEEE TCCA Newsletters 97] and Gonzalez et al. [IEEE Micro 00] Lee et al. [ETRI Journal 03] Naz et al. [ACM SIGARCH 06] A. M. Tokarnia, M. Tachibana Euromicro DSD 2010 Schoool of Electrical and Computer Engineering, UNICAMP
Introduction TSAC-EDPs Trace-based split array caches Organization Two-partition array cache The line size of one partition is 2x that of the other The ways of the partitions have the same size Array-partition mapping and partition degree of set- associativity Determined as to minimize the average memory access EDP A. M. Tokarnia, M. Tachibana Euromicro DSD 2010 Schoool of Electrical and Computer Engineering, UNICAMP
Design Method Inputs Application program, typical inputs, cache size constraint Outputs TSAC-EDPs and best unified caches Main concern Navigate through a large design space Array-partition mapping Degree of set-associativity Strategy Unified caches whose ways are split between two partitions A. M. Tokarnia, M. Tachibana Euromicro DSD 2010 Schoool of Electrical and Computer Engineering, UNICAMP
Design Method Design Steps Trace generation 1. Definition of unified caches 2. Trace analysis 3. Candidate split caches 4. Cache simulation 5. Cache evaluation and selection 6. A. M. Tokarnia, M. Tachibana Euromicro DSD 2010 Schoool of Electrical and Computer Engineering, UNICAMP
Design Method Trace Generation & Definition of Unified Caches Instrument and execute code to generate a trace of array accesses For an access to an array, the trace has an entry with array name, address, and number of words. Unified caches C 0 ( b , n , m ) satisfying the size constraint line size b , degree of set-associativity n , number of sets m A. M. Tokarnia, M. Tachibana Euromicro DSD 2010 Schoool of Electrical and Computer Engineering, UNICAMP
Design Method Trace Analysis Concentration of line L i of array A l X ( A , L i , b ) Fraction of accesses that falls to same half of L i as the previous access 1. L 1 1. L 2 2. L 1 2. L 2 3. L 1 3. L 2 4. L 1 4. L 2 Exemple 1 Exemplo 2 l X ( A , L 1 , 8 ) = 4/4 = 1 l X ( A , L 1 , 8 ) = 1/4 = 0.25 A. M. Tokarnia, M. Tachibana Euromicro DSD 2010 Schoool of Electrical and Computer Engineering, UNICAMP
Design Method Trace Analysis X ( A , b ): concentration of A median concentration of the lines accessed. N ( A , b ): number of distinct lines of A accessed. N t ( b ): number of distinct lines accessed for all arrays. Concentration defines a partial ordering of arrays A. M. Tokarnia, M. Tachibana Euromicro DSD 2010 Schoool of Electrical and Computer Engineering, UNICAMP
Design Method Candidate Split Caches Array-partition mapping Arrays in which accesses exhibit lower concentration → C 1 ( b , n 1 , m ) Other arrays → C 2 ( b/2 , n 1 , 2 m ) At most, (#arrays -1) array-partition mappings. 2 #arrays array-partition mappings for exhaustive search A. M. Tokarnia, M. Tachibana Euromicro DSD 2010 Schoool of Electrical and Computer Engineering, UNICAMP
Design Method Candidate Split Caches Degree of set-associativity of the partitions Number of lines is approximantely proportional to number of distinct lines accessed ( , ) / ( ) 0 . 5 n n N A b N t b 2 A mapped to C 2 n n n 1 2 A. M. Tokarnia, M. Tachibana Euromicro DSD 2010 Schoool of Electrical and Computer Engineering, UNICAMP
Design Method Examples of Cache Spliting C 0 = { X , Y , Z , A , B } b b b b m … 1 2 n - 1 n ( C 2 | C 1 ) n ( C 2 | C 1 ) 1 ( C 2 | C 1 ) 2 C 2 = { X , Y , Z , A } C 1 = { B } C 2 = { X , Y , Z } C 1 = { A , B } C 2 = { X } C 1 = { A , B , Y , Z } b b b/ 2 b b b b/ 2 b/ 2 b/ 2 … m … … 2 m m 2 m m … 2 m 1 n - 1 1 n - 1 1 1 n - 1 1 1 A. M. Tokarnia, M. Tachibana Euromicro DSD 2010 Schoool of Electrical and Computer Engineering, UNICAMP
Design Method Simulation of Unified and Split Caches Obtain metrics used for cache evaluation miss rates Other metrics can be obtained from cache models and memory datasheets Hit access time and energy consumption Miss (time) penalty and miss (energy) penalty A. M. Tokarnia, M. Tachibana Euromicro DSD 2010 Schoool of Electrical and Computer Engineering, UNICAMP
Design Method Cache Evaluation C 0 Fraction F of accesses consists of two parallel accesses, one to C 1 and the other to C 2 _ _ _ Average memory access time C 0 Φ 1 _ Hit time C 0 _ _ Miss rate C Miss penalty C 0 0 A. M. Tokarnia, M. Tachibana Euromicro DSD 2010 Schoool of Electrical and Computer Engineering, UNICAMP
Design Method Cache Evaluation C 1 | C 2 _ _ _ Average memory access time C i _ _ _ Hit time C Miss rate C Miss penalty C i i i _ _ _ | Average memory access time C C 1 2 _ _ _ f Average memory access time C 1 1 _ _ _ f Average memory access time C 2 2 A. M. Tokarnia, M. Tachibana Euromicro DSD 2010 Schoool of Electrical and Computer Engineering, UNICAMP
Design Method Cache Selection Unified array caches with minimum EDP Split caches with minimum EDP: TSAC-EDPs A. M. Tokarnia, M. Tachibana Euromicro DSD 2010 Schoool of Electrical and Computer Engineering, UNICAMP
Experimental Results Applications: Convolution (3 arrays) Fast Fourier Transform (4 arrays) Group 3 Fax decoder G3fax (4 arrays) JPEG encoder (DCT, Quantization) (10 arrays) MPEG-2 video decoder (49 arrays) Cache sizes: 8K-byte, 12K-byte Cache access time and energy CACTI, 90 nm Memory access time Samsung DDR266 Memory energy consumption 50x cache energy consumption A. M. Tokarnia, M. Tachibana Euromicro DSD 2010 Schoool of Electrical and Computer Engineering, UNICAMP
Experimental Results TSAC-EDPs Unified-EDP C 0 TSAC-EDP C 1 |C 2 Application ( b, n, m ) ( b 1 , n 1 , m 1 ) | ( b 2 , n 2 , m 2 ) Conv-8Kb (32, 2, 128) (64,1,64) | (32,1,128) FFT-8Kb (32, 2, 128) (16,3,128) | (8,1,256) G3fax-8Kb (16, 2, 256) (16,1,256) | (8,1,512) JPEG-8Kb (16, 2, 256) (32,1,128) | (16,1,256) MPEG-8Kb (16, 2, 256) (16,1,256) | (8,1,512) Conv-12Kb (32, 3, 128) (64,1,64) | (32,2,128) FFT-12Kb (16, 6, 128) (16,2,256) | (8,1,512) G3fax-12Kb (16, 3, 256) (16,1,128) | (8,5,256) JPEG-12Kb (16, 3, 256) (16,2,256) | (8,1,512) MPEG-12Kb (16, 3, 256) (32,1,128) | (16,2,256) A. M. Tokarnia, M. Tachibana Euromicro DSD 2010 Schoool of Electrical and Computer Engineering, UNICAMP
Experimental Results Dif EDP (EDP C 0 -EDP C 1 | C 2 )/EDP C 0 Dif EDP 0% 25% 50% 70% 60% 50% 40% 30% 20% 10% 0% -10% -20% Conv Conv FFT FFT G3fax G3fax JPEG JPEG MPEG MPEG 8 Kb 12 Kb 8 Kb 12 Kb 8 Kb 12 Kb 8 Kb 12 Kb 8 Kb 12 Kb A. M. Tokarnia, M. Tachibana Euromicro DSD 2010 Schoool of Electrical and Computer Engineering, UNICAMP
Experimental Results Dif Energy and Dif AccessTime (E C 0 -E C 1 | C 2 )/E C 0 (D C 0 -D C 1 | C 2 )/D C 0 Dif Energy Dif AccessTime 0% 25% 50% 40% 60% 35% 50% 30% 40% 30% 25% 20% 20% 10% 15% 0% 10% -10% 5% -20% 0% -30% -40% -5% Conv Conv FFT FFT G3fax G3fax JPEG JPEG MPEG MPEG Conv Conv FFT FFT G3fax G3fax JPEG JPEG MPEG MPEG 8 Kb 12 Kb 8 Kb 12 Kb 8 Kb 12 Kb 8 Kb 12 Kb 8 Kb 12 Kb 8 Kb 12 Kb 8 Kb 12 Kb 8 Kb 12 Kb 8 Kb 12 Kb 8 Kb 12 Kb A. M. Tokarnia, M. Tachibana Euromicro DSD 2010 Schoool of Electrical and Computer Engineering, UNICAMP
Conclusion TSAC-EDPs have better average memory access time, energy and energy-delay product than unified caches of the same size for some applications. Parallel accesses to cache partitions, when possible, can further improve average memory access EDP , time, and energy. Concept of array concentration can be applied to other design methods. A. M. Tokarnia, M. Tachibana Euromicro DSD 2010 Schoool of Electrical and Computer Engineering, UNICAMP
Recommend
More recommend