Low Power Design Prof. Dr. J. Henkel CES - Chair for Embedded Systems KIT, Germany V. Low Power Software and Compiler Prof. Jörg Henkel, Low Power Design, SS2014 ces.itec.kit.edu
2 Overview Components consuming power hardware memory Levels of abstraction interconnect -system - RTL - gate - transistor Tasks Optimize (i.e. minimize for low Battery issues power) Design / co-design (synthesize, compile, …) OS Estimate and software simulate Prof. Jörg Henkel, Low Power Design, SS2014 ces.itec.kit.edu
3 Overview Software power analysis/measurement Software power estimation models Optimizing software for low power through compilation phase Instruction scheduling Compiler-driven DVS Prof. Jörg Henkel, Low Power Design, SS2014 ces.itec.kit.edu
4 Low-Power Software: Overview Source Power efficient Source Code Low-power compilers: Target-independent - transformations optimizations - code generation - memory layout - code compression System Code generation Software: RTOS, Device Target architecture drivers, Assembler/Linker model … Instruction-level power model Low-power Libraries OS, middleware - power management (src: A. Raghunathan, NEC) ISS, debugger - voltage/clock speed HW scheduling Target Memory Image Co-simulator Prof. Jörg Henkel, Low Power Design, SS2014 ces.itec.kit.edu
5 Instruction-level SW power modeling Energy consumed = f(Instruction sequence) Model using a) per-instruction costs , b) circuit state overhead costs , and c) penalties for pipeline stalls and cache misses Program energy cost = Σ I (Base I x N I ) + Σ I,J (Ovhd I,J x N I,J ) + Ν CM ∗ Penalty CM + Ν Stall ∗ Penalty Stall N I : Number of times instruction I is executed Base I : Base energy cost of I (ignores stalls,cache misses) Ovhd I,J : Circuit state overhead when I, J are adjacent Penalty CM : Cache Miss Penalty Penalty Stall : Pipeline Stall Penalty Circuit state overhead: depends on processor architecture Constant value for 486DX2, Fujitsu SPARClite Table for Fujitsu DSP due to greater variation (src: A. Raghunathan, NEC) [src:Tiwari] Prof. Jörg Henkel, Low Power Design, SS2014 ces.itec.kit.edu
6 Building instruction-level power models Characterize current drawn by CPU Current for given instruction sequence Simulation based methods Simulate program execution on HW models of the CPU Clk Physical measurement Integration Period of Ampere Meter Digital Ampere meter Current Measurement Setup Put programs in loops Rest of the system Get stable visual reading Power Supply Processors investigated: Intel CPU 486DX, Fujitsu SPARClite, Fujitsu DSP (sr: [Tiwari]) (src: A. Raghunathan, NEC) A Prof. Jörg Henkel, Low Power Design, SS2014 ces.itec.kit.edu
7 Estimation Example Cycles Program Base Cost Block Instances (mA) main: mov bp, sp B1 1 1 285.0 B1 sub sp, 4 1 309.0 B2 4 mov dx, 0 1 309.8 mov word ptr -4[bp], 0 B3 1 2 404.8 L2: jl L2 (taken) 3 mov si, word ptr -4[bp] 1 433.4 add si, si 1 (not taken) 1 309.0 add si, si 1 309.0 Base Cost PROGRAM = mov bx, dx 1 285.0 mov cx, word ptr _a[si] Σ Base Cost BLOCKi * Instances BLOCKi 1 433.4 add bx, cx 1 309.0 mov si, word ptr _b[si] 1 433.4 add bx, si B2 1 309.0 Estimated base current = mov dx, bx 1 285.0 Base Cost PROGRAM / 72 = 369.0mA mov di, word ptr -4[bp] 1 433.4 inc di 1 297.0 mov word ptr -4[bp], di 1 Final estimated current = 369.0 + 15.0 560.1 cmp di, 4 1 313.1 = 384.0mA jl L2 3(1) 405.7(356.9) Measured current = 385.0mA L1: mov word ptr _sum, dx 1 521.7 Similar experiments in 486DX2 and mov sp, bp 1 285.0 B3 SPARClite accurate to within 3% [Tiwari] jmp main 3 403.8 (src: A. Raghunathan, NEC) Prof. Jörg Henkel, Low Power Design, SS2014 ces.itec.kit.edu
8 Estimation flow: summary (src:[Tiwari]) Prof. Jörg Henkel, Low Power Design, SS2014 ces.itec.kit.edu
9 Software power optimization: Example Compiler Generated Code Energy Efficient Code register optimizations push ebx push esi push ebp push edi mov edi,dword ptr 08H[esp] push ebp Original code: lcc mov esi,edi mov ebp,esp sar esi,1 sub esp,24 inc esi mov edi,dword ptr 014H[ebp] mov ebp,esi mov esi,1 Optimized code: hand- mov ecx,edi mov ecx,esi L3: mov esi,edi cmp ebp,1 sar esi,cl generated jle L7 lea esi,1[esi] dec ebp mov dword ptr -20[ebp],esi mov esi,dword ptr 0cH[esp] mov dword ptr -8[ebp],edi mov edi,dword ptr[edi*4][esi] L3: 9% current reduction mov ebx,edi mov edi,dword ptr -20[ebp] jmp L8 cmp edi,1 L7: jle L7 24% running time reduction mov edi,dword ptr 0cH[esp] mov edi,dword ptr -20[ebp] mov esi,dword ptr 4[edi] sub edi,1 mov ebx,dword ptr [ecx*4][edi] mov dword ptr -20[ebp],edi 40.6% energy reduction mov dword ptr [ecx*4][edi],esi lea edi,[edi*4] dec ecx mov esi,dword ptr 018H[ebp] cmp ecx,1 add edi,esi jne L8 33% for circle mov edi,dword ptr [edi] mov dword ptr 4[edi],ebx mov dword ptr -12[ebp],edi jmp L2 jmp L8 L7: mov edi,dword ptr 018H[ebp] (src: A. Raghunathan, NEC) mov esi,dword ptr -8[ebp] lea esi,[esi*4] Program sort circle add esi,edi mov ebx,dword ptr [esi] Version Original Final Original Final mov dword ptr -12[ebp],ebx mov edi,dword ptr 4[edi] mov dword ptr [esi],edi heapsort example mov edi,dword ptr -8[ebp] Current (mA) 525.7 486.6 530.2 514.8 sub edi,1 mov dword ptr -8[ebp],edi Ex. Time (ms) 11.02 7.07 7.18 4.93 cmp edi,1 jne L8 [Tiwari] Energy (10-6J) 19.12 11.35 12.56 8.37 mov edi,dword ptr 018H[ebp] mov esi,dword ptr -12[ebp] Saving 40.60% 33.40% mov dword ptr 4[edi],esi jmp L2 Prof. Jörg Henkel, Low Power Design, SS2014 ces.itec.kit.edu
10 Overview Software power analysis/measurement Software power estimation models Optimizing software for low power through compilation phase Instruction scheduling Compiler-driven DVS Prof. Jörg Henkel, Low Power Design, SS2014 ces.itec.kit.edu
11 A detailed instruction-level power model CPU-intern CPU-extern (src:[Steinke]) - distinction between instruction dependency and data dependency a) instruction-dependent cost inside the CPU b) data-dependent cost inside the CPU c) also considered but not discussed here: power extern to the CPU Prof. Jörg Henkel, Low Power Design, SS2014 ces.itec.kit.edu
A detailed instruction-level 12 power model (cont’d) E CPU_instr Instruction-dependent costs inside the CPU depend on: - the internal buses carrying the immediate value Imm - the register numbers Reg , values kept within the registers RegVal - and the instruction address IAddr . (src:[Steinke]) Prof. Jörg Henkel, Low Power Design, SS2014 ces.itec.kit.edu
A detailed instruction-level 13 power model (cont’d) E CPU_data (src:[Steinke]) Data-dependent costs inside the CPU for n data accesses depend on the data address DAddr , the Data itself and on the direction dir (read/write) Prof. Jörg Henkel, Low Power Design, SS2014 ces.itec.kit.edu
14 A detailed instruction-level power model (cont’d) Results an parameters -parameters of ARM7TDMI energy model (src:[Steinke]) Prof. Jörg Henkel, Low Power Design, SS2014 ces.itec.kit.edu
15 Overview Software power analysis/measurement Software power estimation models Optimizing software for low power through compilation phase Instruction scheduling Compiler-driven DVS Prof. Jörg Henkel, Low Power Design, SS2014 ces.itec.kit.edu
16 Low-power compilers Use instruction-level energy costs to guide code generation Minimize memory accesses Utilize registers effectively Reduce context saving Processor-specific optimizations Dual memory loads, instruction packing Optimize instruction scheduling to reduce activity in specific parts of the system Internal Instruction-bus, processor-memory bus, Instruction register and register decoder [Tiwari94b,Tiwari96,Su94,Tomiyama98,Mehta97,Kandemir00] (src: A. Raghunathan, NEC) Prof. Jörg Henkel, Low Power Design, SS2014 ces.itec.kit.edu
Instruction scheduling for 17 low power Traditional instruction scheduling strategies Reordering instructions in order to: Avoid pipeline stalls Improve resource (register file etc.) usage Increase ILP (Instruction-Level Parallelism) like ‘percolation scheduling” … => main goal: increase performance Traditional steps for instruction scheduling 1) partition program into regions or basic blocks 2) build a control dependency graph CDG and data dependency graph 3) schedule instructions within resource constraints Prof. Jörg Henkel, Low Power Design, SS2014 ces.itec.kit.edu
Recommend
More recommend