Hardware Power 1 Low Power Design Dr Z Wang and Prof Dr J Henkel Dr. Z. Wang and Prof. Dr. J. Henkel CES - Chair for Embedded Systems Karlsruhe Institute of Technology, Germany 3. Hardware power optimization and estimation estimation http://ces.itec.kit.edu Z. Wang and J. Henkel, KIT, SS12
Hardware Power 2 Overview Components consuming power hardware memory memory � Levels of abstraction interconnect � � -system system � - RTL � - gate � - transistor � Tasks � � Optimize (i.e. Optimize (i e minimize for low power) Battery issues � Design / co-design (synthesize, compile, (synthesize, compile, …) � Estimate and simulate OS software software http://ces.itec.kit.edu Z. Wang and J. Henkel, KIT, SS12
3 Hardware Power http://ces.itec.kit.edu (Src: [Anand98]) synthesis flow synthesis flow Generic HW Z. Wang and J. Henkel, KIT, SS12
Hardware Power 4 Low power HW design flow p g � Energy/power needs to be analyzed and optimized at each level of abstraction abstraction � Therefore, appropriate power models for each level are necessary � Shown in Fig: � a) design flow w/o energy/power � � b) d b) design flow with i fl ith energy/power (Src: [Anand98]) http://ces.itec.kit.edu Z. Wang and J. Henkel, KIT, SS12
Hardware Power 5 Power consumption in HW p � A more detailed version than in the intro … = + + + + + + P P P P P P P P P P − avg sw cap short circuit h leakage l k static . . (Src: [Anand98]) (Src: [Anand98]) � In general, four components: g , p � Switching capacity power � Short-circuit power � Short-circuit power � Leakage power � St ti � Static power http://ces.itec.kit.edu Z. Wang and J. Henkel, KIT, SS12
Hardware Power 6 Switching capacity power g p y p � Caused by parasitic capacitors during switching: Fig. shows C_L which is the effective capacitance of all parasitic capacitances 1 1 � Per transition: = i i i 2 P C V N f sw cap . L DD 2 � Means: a) reduce operating frequency, b) reduce C_L, c) reduce voltage, d) � Means: a) reduce operating frequency b) reduce C L c) reduce voltage d) reduce switching activity. ∗ ∗ k C V = � Most common: reduce voltage L DD t ( ( ) ) d − 2 V V V V � � => Problem: delay of gate t_d increases too! P bl d l f t t d i t ! DD th CMOS inverter (Src: [Anand98]) http://ces.itec.kit.edu Z. Wang and J. Henkel, KIT, SS12
Hardware Power 7 Short circuit power p � Explanation: � Caused by direct supply-to-ground path � When CMOS inverter in Fig changes from 1 >0 there is a short � When CMOS inverter in Fig. changes from 1->0 there is a short time frame within which both, nMOS and pMOS transistors are conducting => short circuit current is drawn from power supply (Src: [Anand98]) http://ces.itec.kit.edu Z. Wang and J. Henkel, KIT, SS12
Hardware Power 8 Leakage power g p � Leakage can be divided into two components � First component: � I_diode – refers to the diodes that are formed between diffusion regions and substrate (Src: [Anand98]) � Second component: p � ‘ off ’ transistors still conduct some current � K S technology parameters; W eff effective transistor channel � K, S, technology parameters; W_eff effective transistor channel width � NOTE: leakage power is predicted to be dominant in future silicon technologies technologies http://ces.itec.kit.edu Z. Wang and J. Henkel, KIT, SS12
Hardware Power 9 Static power p � Not relevant in CMOS circuits � Note: in some literature leakage power is denoted as “ static power ” � Static power: only relevant in some nMOS circuits where there is a constant path supply-to-ground h th i t t th l t d http://ces.itec.kit.edu Z. Wang and J. Henkel, KIT, SS12
Hardware Power 10 Power consumption in HW: b breakdown kd 100 � Leakage power will dominate in future ( <100nm) silicon Dynamic technologies power 11 � one means to reduce leakage power is to deploy dielectrics with a high k-value Subthreshold leakage leakage Trajectory if high-k 0,01 dielectrics reach production Gateoxide Gateoxide leakage 10 -4 10 -6 1 1 2 2 2 2 2 time (Src: [Heer04]) (Src: [Heer04]) 9 9 9 9 0 0 0 0 0 0 0 0 0 0 9 9 0 0 1 1 2 0 5 0 5 0 5 0 http://ces.itec.kit.edu Z. Wang and J. Henkel, KIT, SS12
Hardware Power 11 Hardware synthesis for low power low power � Considered here: high-level synthesis (HLS) e.g.: � Operator scheduling � Mod le selection � Module selection � Glitch power reduction � State transition reduction � … http://ces.itec.kit.edu Z. Wang and J. Henkel, KIT, SS12
Hardware Power 12 Operator scheduling for low power low power � What is scheduling in the context of high level synthesis? � What is scheduling in the context of high-level synthesis? � Scheduling assigns operations in the behavioral description to control steps or controller states. Scheduling determines cycle-by- cycle behavior i.e. sequence in which operations are performed y q p p � Some repetition from ESI: (Src: [Anand98]) � multicycling (clock period is rather short) � chaining (clock period rather long) � finding the right clock cycle time is an optimization task itself � Scheduling determines the sequence in which the various � S h d li d t i th i hi h th i operations of the behavioral description are performed, and also dictates which operations and variables can share the same functional units and registers. Thus, scheduling can be used to g g enable resource sharing for low power by ensuring that correlated bl h i f l b i th t l t d variables and operations with correlated operands are appropriately sequenced so that they can share the same resources http://ces.itec.kit.edu Z. Wang and J. Henkel, KIT, SS12
Hardware Power 13 Operator scheduling for low power (cont ’ d) low power (cont d) � Scheduling can be performed so as to enable maximum resource sharing between operations that belong to instances of the same computational pattern, resulting in maximal exploitation of regularity during resource sharing regularity during resource sharing � Scheduling can be used to distribute the slacks or mobilities of various operations in the DFG appropriately so that some operations may be performed using slower more energy-efficient operations may be performed using slower, more energy efficient functional units. Thus, scheduling has an impact on the power trade-offs through module selection � Scheduling determines the distribution of operations over time, g p , and hence affects the profile of the power consumption in the implementation over time (control steps or clock cycles). Reducing peak power is important due to packaging, cooling, and reliability considerations The effect of scheduling on peak power will be considerations. The effect of scheduling on peak power will be illustrated later. (Src: [Anand98]) � => these tasks will be discussed in the following (some in the � => these tasks will be discussed in the following (some in the context of module selection) http://ces.itec.kit.edu Z. Wang and J. Henkel, KIT, SS12
Hardware Power 14 Operator scheduling for LP (cont ’ d) p g ( ) ∗ ∗ C C V V = L DD t ( ) d − 2 V V t DD th d V DD Basic idea: use slack in a data flow graph (dependent upon timing constraints) and: ( p p g ) Shown: normalized Shown: normalized, dependency t_d = f(V_DD) a) Vary V_dd of the ALU where operator is to be executed, or (src:[Saraff95]) b) Assign operator(s) to a different ALU with a b) Assign operator(s) to a different ALU with a lower/higher (fixed) V_dd http://ces.itec.kit.edu Z. Wang and J. Henkel, KIT, SS12
Hardware Power 15 Operator scheduling for LP (cont ’ d) p g ( ) Problem : τ → Obtain a mapping : V S of a data flow graph G=(V,E) given a base execution time t_c (or V_dd) and a timing constraint k t_c minimize k * t c minimize ∑ τ υ 2 ( ) i υ ∈ V i i such that the critical path length of the DFG is <= k * t_c = S { V , V ,..., V } c 1 c 2 ci (src:[Saraff95]) (src:[Saraff95]) http://ces.itec.kit.edu Z. Wang and J. Henkel, KIT, SS12
Hardware Power 16 Operator scheduling for LP (cont ’ d) - algorithm - l i h � Step 1: initialization � Step 2: computing slack (src:[Saraff95]) (src:[Saraff95]) l(v) – longest path of the graph that goes through node v that goes through node v http://ces.itec.kit.edu Z. Wang and J. Henkel, KIT, SS12
Hardware Power 17 Operator scheduling for LP (cont ’ d) - algorithm - algorithm � Step 3: compute max slack value � Step 4: compute dual graph (src:[Saraff95]) (src:[Saraff95]) http://ces.itec.kit.edu Z. Wang and J. Henkel, KIT, SS12
Hardware Power 18 Operator scheduling for LP (cont ’ d) - algorithm - algorithm � Step 5: weight assignment � Step 6: compute longest weighted path � St 6 t l t i ht d th � Step 7: reassign voltages to node in longest path (src:[Saraff95]) (src:[Saraff95]) http://ces.itec.kit.edu Z. Wang and J. Henkel, KIT, SS12
Hardware Power 19 Operator scheduling for LP (cont ’ d) - algorithm - algorithm � Step 8: go back to step 2 p g p � Conclusion Power consumption can be reduced depending on constraints up to around 25% http://ces.itec.kit.edu Z. Wang and J. Henkel, KIT, SS12
Recommend
More recommend