on line power optimization of data flow multi core
play

On line Power Optimization of Data Flow Multi-Core Architecture - PowerPoint PPT Presentation

PATMOS 2010, Grenoble, France 20th Int. Workshop on Power And Timing Modeling, Optimization and Simulation On line Power Optimization of Data Flow Multi-Core Architecture based on Vdd-Hopping for Local DVFS Pascal Vivet 1 , Edith Beigne 1 , Hugo


  1. PATMOS 2010, Grenoble, France 20th Int. Workshop on Power And Timing Modeling, Optimization and Simulation On line Power Optimization of Data Flow Multi-Core Architecture based on Vdd-Hopping for Local DVFS Pascal Vivet 1 , Edith Beigne 1 , Hugo Lebreton 1 , 2007 2007 Nacer-Eddine Zergainoh 2 1 CEA-Leti, Minatec, Grenoble, France 2 TIMA, Grenoble, France {pascal.vivet@cea.fr} 1 Pascal Vivet - CEA/LETI - PATMOS’2010, Grenoble, France

  2. Introduction � Power Consumption Challenge � With convergence and growing capacity of mobile systems � Not all applications require the peak performance level : � Plenty of room for power optimization ! 1 α 2 E ~ CV N � Power Consumption Issue op cycles 2 � Must be addressed at all levels : � from physical implementation to system level 2007 2007 � Ex: Dual –Vt, clock-gating, power switches, DPM, DVFS, task mapping, task allocation, etc … � Main Power Consumption reduction techniques � Dynamic Power Management (DPM) � Dynamic Voltage and Frequency Scaling (DVFS) 2 Pascal Vivet - CEA/LETI - PATMOS’2010, Grenoble, France

  3. Context � Data Flow like application � Pre-determined computation flow, can be model as a task graph � Ex : Video encoding/decoding, Telecom baseband modulation, … � Heterogenous Architecture � Composed of a mix of “Soft IPs” and “Hard IPs” � Interconnected in a Data-Flow manner, using a Network-on-Chip � Use an efficient on-chip DVFS technique 2007 2007 � Using two-set point voltages : “VDD-Hopping” � Objective : � Propose a hardware Local Power Manager � Ensures Real Time constraints � On-Line Optimization, benefiting from Dynamic Slack Time 3 Pascal Vivet - CEA/LETI - PATMOS’2010, Grenoble, France

  4. Outline � Introduction & Context � Low Power GALS NoC architecture � Local Power Manager and DVFS control 2007 2007 � Case study on an Telecom Application � Conclusion 4 Pascal Vivet - CEA/LETI - PATMOS’2010, Grenoble, France

  5. Low Power GALS NoC Architecture � GALS scheme : � Independant synchronous islands � Interconnected by an asynchronous Network-on-Chip � Within each IP unit : � Local Clock generator � provide local core frequency � Power Supply Unit , � provides local core supply � Network Interface , � Handle NoC communications � Local Power Manager , 2007 2007 � Control low power mechanisms [E. Beigne & al, NOCS’08, JSSC’09] � A main CPU in charge of global power management � Task scheduling, DVFS parameters, … � Each IP unit is a fully independent Frequency and Power domain � Local fine grain power management can be executed during IP computation and communication independently from each others 5 Pascal Vivet - CEA/LETI - PATMOS’2010, Grenoble, France

  6. VDD-Hopping for DVFS : Principle Energy per operation scales with V² � In most applications, IPs do not need to be at full speed � Vhigh, Fhigh � Decrease Voltage and Frequency to be energy efficient Vdd-Hopping � DVFS using two set points Electrical power � Use of two PMOS power switches Vavg,Favg � Vhigh (1.2 V), Vlow (0.7 V), or off (0 V) Can easily be integrated in any CMOS circuit � Vlow, Flow Initially proposed by Tokyo Univ. (2000) � � But was not integrated on chip � Details in [S.Miermont, P.Vivet, M. Renaudin, PATMOS07] Computing power � Similar Recent Design [Bevan Baas, UC Davis] Hopping Operation and LPM control 2007 2007 IP continues computation and 1 communication during voltage LPM 0 transitions F high F low Frequency 0 Hopping transistion < 50 ns 1 Clk VDD-Hopping transitions 0 V high are controlled by the LPM Voltage V low 0 [S. Miermont & al, PATMOS’07] 6 Pascal Vivet - CEA/LETI - PATMOS’2010, Grenoble, France

  7. Power modes and NI features � Power Modes : � Controlled by the Local Power Manager (LPM) Power Mode Behavior NI and IP are active, at Vhigh supply High NI and IP are active, at Vlow supply Low Idle NI only active, IP core clock is gated, at Vlow voltage Supply is off (only leakage) Off 2007 2007 � Network Interface features : How to control DVFS and VDD- � Handle NoC protocol Hopping from an application � Packetisation, routing, flow control perspective ? � Control IP hardware tasks What kind of LPM control ? � Configuration, Execution, Interrupt (Wake up the LPM when incoming task raises, etc.) 7 Pascal Vivet - CEA/LETI - PATMOS’2010, Grenoble, France

  8. Outline � Introduction & Context � Low Power GALS NoC architecture � Local Power Manager and DVFS control 2007 2007 � Case study on an Telecom Application � Conclusion 8 Pascal Vivet - CEA/LETI - PATMOS’2010, Grenoble, France

  9. Local DVFS control : main principles � Data Flow Heterogeneous architecture � The application is mapped and distributed on distinct IPs � The Data Flow are fully handled by hardware (Network Interface) � Data Flow Applications � With both latency and throughput constraints � Global latency control is required to meet the deadline � Propose to use Worst Case Execution Time (WCET) � On line optimization allows to benefit of data dependent computation � Trade-off dynamic slack time versus energy 2007 2007 ⇒ Hybrid global and local optimization scheme � Off line global scheduling � WCET is computed off line on application traces � On line local control, to offer on line optimisation � Local Power Manager is using VDD-Hopping (two set points only) for DVFS [D. Marculescu et al, ASP-DAC’2005] [A. Maxiaguine et al, CODESS’2005] 9 Pascal Vivet - CEA/LETI - PATMOS’2010, Grenoble, France

  10. LPM : NI Synchronization � LPM synchronization with NI task execution � This a very generic case. Do not depend on IP internal computations / structure. � Various trade-off in task granularity, and hopping frequency to smooth the traffic � Timeslot and WCEC constraints − N N N N N τ = + = + h l h wcec h f f f f h l h l � Determine the number of cycles for High and Low voltage ( ) f = − τ × = − h N N f N N N − h wcec l l wcec h f f h l 2007 2007 1 Config. 0 chargée t 1 Coeur 0 t actif V high V low t 10 Pascal Vivet - CEA/LETI - PATMOS’2010, Grenoble, France

  11. LPM : IP core synchronization LPM synchronisation with the actual core computation � � Define atomic task as a sub task where number of cycles and number of inputs/outputs are known � Need additional logic to detect when IP core really starts Benefit of Vdd-Hopping properties � � Compute the number of cycles Nh and Nl as before, � Wait incoming data at Vlow, � Start atomic task at Vlow, do a Vhigh transition once, and go back to Vlow as soon as atomic task computation is finished 2007 2007 1 Task 0 Loaded t 1 Atom ic 0 t Task V high V low t 11 Pascal Vivet - CEA/LETI - PATMOS’2010, Grenoble, France

  12. LPM : On-Line optimization � When AEC < WCEC, dynamic slack time is available Data dependant delay, variable communication delay, … � � On line optimization to further reduce energy � Main principle ? reallocate remaining time of current task to the next one N l N h V high V low t Idle N' l N' h Compute V high V low t k-1 k k+1 2007 2007 T T' T n n ′ τ + τ = τ + + • Next timeslot is incremented with unused cycles : h l f f h l • Next task cycle numbers Nh and Nl are updated : ′ ⎧ ( ) = − + ⎪ N N n 0 , 5 n < ≤ f f 2 f h h l h ⎨ l h l ′ ′ ⎪ = − ⎩ N N N l wcec h 12 Pascal Vivet - CEA/LETI - PATMOS’2010, Grenoble, France

  13. Outline � Introduction & Context � Low Power GALS NoC architecture � Local Power Manager and DVFS control 2007 2007 � Case study on an Telecom Application � Conclusion 13 Pascal Vivet - CEA/LETI - PATMOS’2010, Grenoble, France

  14. Case study : 3GPP LTE Telecom Application � 3GPP LTE Application � MIMO scheme (2 antennas) 3GPP LTE Application � 14 OFDM symbols Estim. canal Estim. CFO Ant. 1 Synchro Démodulation symbole OFDM Correc. CFO Synchro Tampon Décodage Démod. Désentrelacement Trame TTI MIMO souple Correc. Ant. 2 CFO Synchro Démodulation symbole OFDM Dépoinçonnage Estim. CFO Turbo-décodage Estim. canal bits MAGALI circuit 2007 2007 00 01 02 03 04 Interface NoC • 15 NoC routers CFO MC8051 Estim. canal ARM mc8051_12 • Dedicated HW units (OFDM, mep_10 10 11 12 13 14 Bit interleaving, …) Démod. Décodage Décodage Turbo- SME SME • Generic DSP (Mephisto Cores) OFDM MIMO MIMO décodage sme_10w sme_21 trx_ofdm_20 mep_22 mep_23 asip_24 • Memory Controllers (SME) Interface NoC 20 21 22 23 24 Démod. CFO Desentrelac. SME • 65nm technology, 30mm² OFDM Estim. canal Démod. sme_22s trx_ofdm_20s mep_21s rx_bit_23s Task mapping on the GALS NoC architecture [F. Clermidy et al, ISSCC’2010] 14 Pascal Vivet - CEA/LETI - PATMOS’2010, Grenoble, France

Recommend


More recommend