combining data remapping and voltage frequency scaling of
play

Combining Data Remapping and Voltage/Frequency Scaling of Second - PowerPoint PPT Presentation

Combining Data Remapping and Voltage/Frequency Scaling of Second Level Memory for Energy Reduction in Embedded Systems Sudarshan K. Srinivasan, Jun Cheol Park and Vincent J. Mooney III Georgia Institute of Technology {darshan, jcpark,


  1. Combining Data Remapping and Voltage/Frequency Scaling of Second Level Memory for Energy Reduction in Embedded Systems Sudarshan K. Srinivasan, Jun Cheol Park and Vincent J. Mooney III Georgia Institute of Technology {darshan, jcpark, mooney}@ece.gatech.edu

  2. Outline � Introduction � Motivation � Related Work in Power Modeling � Experimental Setup � Data Remapping � Voltage/Frequency Scaling of Off-chip Memory and Bus � Experimental Results � Conclusion Jun Cheol Park Georgia Institute of Technology 2 ESCODES 24 Sep. 2002

  3. Introduction � Power/energy is a major issue in embedded systems � Mobile devices require longer usage time Jun Cheol Park Georgia Institute of Technology 3 ESCODES 24 Sep. 2002

  4. Introduction (Cont.) Embedded System � Memory consumes Processor up to 45% of the L1 cache Off-chip bus total system power* Off-chip memory � Memory is a main Non- target for memory power/energy reduction Memory *P. Panda, N. Dutt, and A. Nicolau. Memory Issues In Embedded Systems-On-Chip, Optimizations and Exploration . Kluwer Academic Publishers, 1999. Jun Cheol Park Georgia Institute of Technology 4 ESCODES 24 Sep. 2002

  5. Motivation Data Remapping Voltage/Frequency Scaling Software Technique Hardware Technique Embedded System Processor+L1 cache Reduction in Energy E.T. & Energy Off-chip Bus+L2 cache Reduction Jun Cheol Park Georgia Institute of Technology 5 ESCODES 24 Sep. 2002

  6. Related Work in Power Modeling � Simplescalar/ARM PowerAnalyzer* � Cycle level power/performance simulator � SimplePower** � Architectural power estimation tool � Does not capture the energy of control unit of processor, clock generation * http://www.eecs.umich.edu/~jringenb/power/ ** http://www.cse.psu.edu/~mdl/software.htm Jun Cheol Park Georgia Institute of Technology 6 ESCODES 24 Sep. 2002

  7. Experimental Setup Processor Core Off-chip Bus L1 and L2 caches Energy Energy Energy + System Energy Jun Cheol Park Georgia Institute of Technology 7 ESCODES 24 Sep. 2002

  8. Experimental Setup (Cont.) � Processor core power Benchmark Program (C/C++) Binary Translation ARM9 Based System Architecture RTL Description (Verilog) Functional Simulation (VCS) Synthesize Toggle Rate (Activity) Verilog Model Generation Processor Core Power Jun Cheol Park Georgia Institute of Technology 8 ESCODES 24 Sep. 2002

  9. Experimental Setup (Cont.) � Processor core power � MARS (Michigan ARM Benchmark Program (C/C++) Simulator) Binary Translation � A cycle accurate verilog ARM9 Based System Architecture model of a RISC processor RTL Description (Verilog) � Capable of running ARM instructions Functional Simulation (VCS) Synthesize Toggle Rate (Activity) Verilog Model Generation Processor Core Power Jun Cheol Park Georgia Institute of Technology 9 ESCODES 24 Sep. 2002

  10. Experimental Setup (Cont.) � Processor core power � Collect toggle rate of Benchmark Program (C/C++) internal logic signals using Binary Translation Synopsys VCS simulation ARM9 Based System Architecture � Synthesize verilog model RTL Description (Verilog) using TSMC .25 µ library Functional Simulation (VCS) Synthesize Toggle Rate (Activity) Verilog Model Generation Processor Core Power Jun Cheol Park Georgia Institute of Technology 10 ESCODES 24 Sep. 2002

  11. Experimental Setup (Cont.) � Processor core power � Estimate power using Benchmark Program (C/C++) Synopsys Power Compiler Binary Translation ARM9 Based System Architecture RTL Description (Verilog) Functional Simulation (VCS) Synthesize Toggle Rate (Activity) Verilog Model Generation Processor Core Power Jun Cheol Park Georgia Institute of Technology 11 ESCODES 24 Sep. 2002

  12. Experimental Setup (Cont.) � Off-chip bus power Benchmark Program (C/C++) � Bus capacitance obtained Binary Translation from actual board ARM9 Based System Architecture � PCB board with SA110 processor (Skiff board) RTL Description (Verilog) Functional Simulation (VCS) Skiff board Off-chip bus Toggle Rate (Activity) parameters Generation Off-chip Bus Power Jun Cheol Park Georgia Institute of Technology 12 ESCODES 24 Sep. 2002

  13. Experimental Setup (Cont.) L1 and L2 caches energy � TRIMARAN* � TRIMARAN � Integrated compilation and performance monitoring TRICEPS SMACS infrastructure � ARM-like processor simulator � TRICEPS Generate ARM code Execution Time Execution Statistics � � SMACS (Smart Memory and Cache Hierarchy Simulator) cache activity statistics � L1 and L2 caches Energy Kamble and Ghose model** � *TRIMARAN http://www.trimaran.org **M. Kamble and K. Ghose ”Analytical energy dissipation models for low power caches,” Proceedings of the International Symposium on Low Power Electronics and Design, pp. 143-148, Aug. 1997. Jun Cheol Park Georgia Institute of Technology 13 ESCODES 24 Sep. 2002

  14. Experimental Setup (Cont.) Processor Core TRIMARAN Power + Execution Time Off-chip Bus Power + L1 and L2 caches Energy System Energy Jun Cheol Park Georgia Institute of Technology 14 ESCODES 24 Sep. 2002

  15. Data Remapping* � A compile time technique for performance enhancement and energy reduction � Remapping data into new set such that data items that are more likely to be used together are grouped together into the same cache block � Enhancing spatial locality *K. Palem, R. Rabbah, P. Korkmaz, V. Mooney and K. Puttaswamy, "Design Space Optimization of Embedded Memory Systems via Data Remapping," Proceedings of the Languages, Compilers, and Tools for Embedded Systems (LCTES’02), pp. 28-37, June 2002. Jun Cheol Park Georgia Institute of Technology 15 ESCODES 24 Sep. 2002

  16. Data Remapping (Cont.) Amount of data fetched before and after remapping (Traveling salesman problem in Olden Suite) Jun Cheol Park Georgia Institute of Technology 16 ESCODES 24 Sep. 2002

  17. Data Remapping (Cont.) Data Objects An item in memory is accessed by initiating a � load of the contents of a memory location or block address Memory Since a memory access is expensive, a set of � adjacent memory locations are loaded at the same time and stored in a cache The set of adjacent memory locations is � known as a memory block � Blocks do not overlap and have the same size Each address can be mapped to a block in � memory Jun Cheol Park Georgia Institute of Technology 17 ESCODES 24 Sep. 2002

  18. Data Remapping (Cont.) � Data reorganization is the relocation of data objects in memory Memory Reorganization data objects block Jun Cheol Park Georgia Institute of Technology 18 ESCODES 24 Sep. 2002

  19. Data Remapping (Cont.) � Analyze application memory access pattern then remap data Address Memory Block Jun Cheol Park Georgia Institute of Technology 19 ESCODES 24 Sep. 2002

  20. Voltage/frequency scaling of off-chip memory and bus* � Scaling down supply voltage of off-chip bus and memory (L2 cache) � P is proportional to V 2 � Significant energy saving in L2 cache � Doubling the memory access latency � L2 cache miss rate affects system performance significantly *K. Puttaswamy, K. Choi, J. C. Park, V. J. Mooney III, A. Chatterjee and P. Ellervee, System Level Power-Performance Trade-Offs in Embedded Systems Using Voltage and Frequency Scaling of Off-Chip Buses and Memory,” Proceedings of International Symposium on System Synthesis, to appear, October, 2002, Kyoto, Japan. Jun Cheol Park Georgia Institute of Technology 20 ESCODES 24 Sep. 2002

  21. Voltage/frequency scaling of off-chip memory and bus (Cont.) Processor Processor L2 Memory L2 Memory Core Core Write Buffer 100 Mhz, 3.3 V 50 Mhz, 2.0 V 100 Mhz, 2.75 V 100 Mhz, 2.75 V Jun Cheol Park Georgia Institute of Technology 21 ESCODES 24 Sep. 2002

  22. Experimental Results � Two Olden benchmarks (Health and Perimeter) are used � The supply voltage for L2 cache and buses are scaled down to 2V, 50Mhz � The benchmarks are remapped and simulated with 50Mhz L2 cache � Half size L1 and L2 cache system is simulated � Data remapping can achieve same execution time with half cache resources Jun Cheol Park Georgia Institute of Technology 22 ESCODES 24 Sep. 2002

  23. Experimental Results (Cont.) Energy delay with frequency/voltage scaling of memory (FVM) and data remapping (DR) for health benchmark (L1 32KB 16B/line, L2 1MB 32B/line) After After After Before After After After DR+FVM DR+FVM DR+FVM DR, FVM DR FVM DR+FVM 1/2 size L1 1/2 size L2 1/2 size L1,L2 Execution Cycles 803645821 479612138 892552982 578046486 603275469 711151104 736311686 Delay (Execution Time)(s) 8.036 4.796 8.926 5.78 6.033 7.112 7.363 Energy(J) 17.076 10.360 14.316 9.274 9.468 11.158 10.134 Energy*Delay 137.231 49.687 127.778 53.608 57.118 79.35 74.618 % Energy Reduction 0 39.33 16.16 45.69 44.55 34.66 40.65 % Energy*Delay Reduction 0 63.79 6.89 60.94 58.38 42.18 45.63 Jun Cheol Park Georgia Institute of Technology 23 ESCODES 24 Sep. 2002

Recommend


More recommend