Improving Data Access Efficiency by Using Context-Aware Loads and Stores Alen Bardizbanyan, Magnus Själander † , David Whalley ‡ , Per Larsson-Edefors Chalmers University of Technology † Uppsala University ‡ Florida State University
Conventional L1 DC Access ADDR-GEN SRAM-ACCESS DATA-FORMATTING Execution Units DTLB Way Format Forward Select Data Data = TAG-0 Register = File TAG-N Other Forwarding Base Address DATA-0 A Writeback G Offset U DATA-N
Energy Usage of a 4-way L1 Data Cache Address: Virtual Page Number (VPN) Set index Line offset VPN set index line offset set index line offset set index line offset set index line offset Tag Tag Tag Tag DTLB Array-0 Data Array-1 Data Array-2 Data Array-3 Data Array-0 Array-1 Array-2 Array-3 Physical Page PPN PPN PPN PPN = = = = Number (PPN) Way Select-0 Data-0 Way Select-1 Data-1 Way Select-2 Data-2 Way Select-3 Data-3 Way Select Logic Data Out 10% 30% 60% Contribution to overall L1 load access energy
Energy Usage of a 4-way L1 Data Cache Address: Virtual Page Number (VPN) Set index Line offset VPN set index line offset set index line offset set index line offset set index line offset Tag Tag Tag Tag DTLB Array-0 Data Array-1 Data Array-2 Data Array-3 Data Array-0 Array-1 Array-2 Array-3 Physical Page PPN PPN PPN PPN = = = = Number (PPN) Way Select-0 Data-0 Way Select-1 Data-1 Way Select-2 Data-2 Way Select-3 Data-3 Way Select Logic Data Out 10% 30% 60% Contribution to overall L1 load access energy 60% of the energy is due to reading the data memories in parallel
Energy Breakdown of an Embedded Processor Clock (12.1%) L1 DC (21.7%) Pipeline (31.4%) L1 IC (34.8%)
Phased L1 DC Access ADDR-GEN DTLB/TAG-ACCESS DATA-ACCESS DATA-FORMATTING Execution Units DTLB Way Format Forward Select Data Data = TAG-0 Register = File TAG-N Other Forwarding Base Address DATA-0 A Writeback G Offset U DATA-N
Phased L1 DC Performance Overhead Execution time (normalized) 1.02 1.04 1.06 1.08 1.12 1.14 1.16 1.1 1 adpcm basicmath Average performance overhead of 8%. bitcount blowfish crc dijkstra fft gsm ispell jpeg lame patricia pgp qsort rijndael rsynth sha stringsearch susan tiff average
Context Aware Loads — Case0 ADDR-GEN SRAM-ACCESS DATA-FORMATTING Execution Units DTLB Way Format Forward Select Data Data = TAG-0 Register = File TAG-N Other Forwarding Base Address DATA-0 A Writeback G Offset U DATA-N r[2] = M[r[4]+76] r[3] = r[3]+r[2]
Context Aware Loads — Case0 ADDR-GEN SRAM-ACCESS DATA-FORMATTING Execution Units DTLB Way Format Forward Select Data Data = TAG-0 Register = File TAG-N Other Forwarding Base Address DATA-0 A Writeback G Offset U DATA-N r[2] = M[r[4]+76] r[3] = r[3]+r[2]
Context Aware Loads — Case0 ADDR-GEN SRAM-ACCESS DATA-FORMATTING Execution Units DTLB Way Format Forward Select Data Data = TAG-0 Register = File TAG-N Other Forwarding Base Address DATA-0 A Writeback G Offset U DATA-N r[2] = M[r[4]+76] r[3] = r[3]+r[2]
Context Aware Loads — Case1 ADDR-GEN SRAM-ACCESS DATA-FORMATTING Execution Units DTLB Way Format Forward Select Data Data = TAG-0 Register = File TAG-N Other Forwarding Base Address DATA-0 A Writeback G Offset U DATA-N r[2] = M[r[4]] r[3] = r[3]+r[2]
Context Aware Loads — Case1 ADDR-GEN SRAM-ACCESS DATA-FORMATTING Execution Units DTLB Way Format Forward Select Data Data = TAG-0 Register = File TAG-N Other Forwarding Base Address DATA-0 A Writeback G Offset U DATA-N r[2] = M[r[4]] r[3] = r[3]+r[2]
Context Aware Loads — Case1 DATA-ACCESS DATA-FORMATTING Execution Units DTLB Way Format Forward Select Data Data = TAG-0 Register = File TAG-N Other Forwarding DATA-0 Writeback DATA-N r[2] = M[r[4]] r[3] = r[3]+r[2]
Context Aware Loads — Case2 DATA-ACCESS DATA-FORMATTING Execution Units DTLB Way Format Forward Select Data Data = TAG-0 Register = File TAG-N Other Forwarding DATA-0 Writeback DATA-N r[2] = M[r[4]+4] r[3] = r[3]+r[2]
Context Aware Loads — Case2 DATA-ACCESS DATA-FORMATTING Execution Units DTLB Way Format Forward Select Data Data = TAG-0 Register = File TAG-N Other Forwarding DATA-0 Writeback DATA-N r[2] = M[r[4]+4] r[3] = r[3]+r[2]
Context Aware Loads — Case2 ADDR-GEN DATA-ACCESS DATA-FORMATTING Execution Units DTLB Way Format Forward Select Data Data = TAG-0 Register = File TAG-N Other Forwarding Base Address DATA-0 A Writeback G Offset U DATA-N r[2] = M[r[4]+4] r[3] = r[3]+r[2]
Context Aware Loads — Case3 ADDR-GEN DATA-ACCESS DATA-FORMATTING Execution Units DTLB Way Format Forward Select Data Data = TAG-0 Register = File TAG-N Other Forwarding Base Address DATA-0 A Writeback G Offset U DATA-N r[2] = M[r[4]+4] <3 or more insts> r[3] = r[3]+r[2]
Context Aware Loads — Case3 ADDR-GEN DATA-ACCESS DATA-FORMATTING Execution Units DTLB Way Format Forward Select Data Data = TAG-0 Register = File TAG-N Other Forwarding Base Address DATA-0 A Writeback G Offset U DATA-N r[2] = M[r[4]+4] <3 or more insts> r[3] = r[3]+r[2]
Context Aware Loads — Case3 ADDR-GEN DTLB/TAG-ACCESS DATA-ACCESS DATA-FORMATTING Execution Units DTLB Way Format Forward Select Data Data = TAG-0 Register = File TAG-N Other Forwarding Base Address DATA-0 A Writeback G Offset U DATA-N r[2] = M[r[4]+4] <3 or more insts> r[3] = r[3]+r[2]
Context Aware Loads — Cases 0-3 ADDR-GEN SRAM-ACCESS DATA-FORMATTING Execution Units DTLB Way Format Forward Select Data Data = TAG-0 Register r[2] = M[r[4]+76] Case0: = File TAG-N Other r[3] = r[3]+r[2] Normal Access Forwarding Base Address DATA-0 A Writeback G Offset U DATA-N DATA-ACCESS DATA-FORMATTING Execution Units DTLB Way Format Forward Select Data Data = TAG-0 Register r[2] = M[r[4]] Case1: = File TAG-N Other r[3] = r[3]+r[2] Avoids Stalls Forwarding DATA-0 Writeback DATA-N ADDR-GEN DATA-ACCESS DATA-FORMATTING Execution Way Format Units DTLB Forward Select Data Data = TAG-0 Register r[2] = M[r[4]+4] Case2: = File TAG-N Other r[3] = r[3]+r[2] 1x Data Array Access Forwarding Base Address DATA-0 A Writeback G Offset U DATA-N ADDR-GEN DTLB/TAG-ACCESS DATA-ACCESS DATA-FORMATTING Execution Units DTLB Way Format Forward Select Data Data = TAG-0 Case3: r[2] = M[r[4]+4] Register = File TAG-N <3 or more insts> 1x Data Array Access Other Forwarding No Tag Speculation r[3] = r[3]+r[2] Base Address DATA-0 A Writeback G Offset U DATA-N
Context Aware Loads — Pipeline ADDR-GEN DTLB/TAG-ACCESS DATA-ACCESS DATA-FORMATTING Execution Way Format Units DTLB Forward Select Data Data = TAG-0 = TAG-N Base Addr. Register DATA-0 A File Writeback G Other Offset U DATA-N Forwarding
Strided Accesses L3: r[2]=M[r[4]]; ... r[22]=M[r[sp]+100]; ... r[21]=M[r[sp]+96]; r[4]=r[4]+4; r[20]=M[r[sp]+92]; PC=r[4]!=r[5],L3; ...
Strided Accesses — Strided Access Structure L3: r[2]=M[r[4]]; ... r[22]=M[r[sp]+100]; ... r[21]=M[r[sp]+96]; r[4]=r[4]+4; r[20]=M[r[sp]+92]; PC=r[4]!=r[5],L3; ... L1 DC L1 DC L1 DC word strided V tag index way PP DV offset data (SD) 1 ... n 2 −1
Context Aware Loads — Case4 Execution Units Way Format Forward DTLB Select Data Data = TAG-0 Register = TAG-N File T+I Other Forwarding DATA-0 Writeback DATA-N r[2] = M[r[4]] r[3] = r[3]+r[2]
Context Aware Loads — Case5 ADDR-GEN SRAM-ACCESS DATA-FORMATTING Execution Units Way Format Forward DTLB Select Data Data = TAG-0 Register = TAG-N File Other T+I Forwarding Base Address DATA-0 A Writeback G Offset U DATA-N r[2] = M[r[4]+4] r[3] = r[3]+r[2]
Context Aware Loads — Case6 DATA-ACCESS Execution Units DTLB = TAG-0 Register = TAG-N File Other T+I+SD Forwarding DATA-0 DATA-N r[2] = M[r[4]] r[3] = r[3]+r[2]
Context Aware Loads — Case7 ADDR-GEN SRAM-ACCESS Execution Units DTLB = TAG-0 Register = TAG-N File Other T+I SD Forwarding Base Address DATA-0 A G Offset U DATA-N r[2] = M[r[4]+4] r[3] = r[3]+r[2]
Recommend
More recommend