Interconnect Power and Delay Optimization by Dynamic Programming in Gridded Design Rules Konstantin Moiseev, Avinoam Kolodny EE Dept. Technion, Israel Institute of Technology Shmuel Wimer Eng. School, Bar-Ilan University March 2010 ISPD 2010 1
Agenda • What is the problem? • Results for 32nm design • It is NP complete • Optimal solution by dynamic programming • How it works in practice • Further research problems March 2010 ISPD 2010 2
Interconnect Power and Delay line-to-line coupling line-to-line coupling signal's activity, 0<= AF <=1 Line-to-line coupling is dynamic power killer Using Elmore delay model, simple, inaccurate but with high fidelity March 2010 ISPD 2010 3
Interconnect Bus Model S i W i S i+1 A L March 2010 ISPD 2010 4
Delay and Dynamic Power Minimization Delay: γ ε 1 1 ( ) = α + β + + δ + + i i D s , w s , w − µ µ i i 1 i i i i i i w w s s − i i i 1 i α β γ δ ε , , , , - technology parameters, driver's i i i i i resistance, capacitive load and bus length . L Dynamic power: 1 1 ( ) = κ + η + P s , w s , w − µ µ i i 1 i i i i i s s − i 1 i κ η , - technology parameters, signal's activity, and bus length . L i i March 2010 ISPD 2010 5
Formulation of the Problem ∑ ( ) ( ) n = = Minimize sum max D D s , w s , o D r max D s , w s , − − = i i 1 i i i i 1 i i i 1 ≤ ≤ delay: 1 i n = ∑ Minimize ( ) n , , P P s w s − power: = i i 1 i i i 1 Subject to ∑ ∑ n n + = w s A = = i i i 1 i 0 Constrained area: { } { } ∈ ∈ s S ,..., S , w W ,..., W In 32nm and 22nm: i 1 p i 1 q problem is NP-complete Discrete optimization: Dynamic programming works March 2010 ISPD 2010 6
Power-Delay “Shape Function” March 2010 ISPD 2010 7
Results Obtained for 32nm • Implemented in C++ / OpenAccess • Ran on 32nm control blocks of Intel mobile processor – Routed by Synopsys tool – Width and space re-allocated in metal 2, 3 and 4 – Used effective drivers and loads from netlist – Typical block size was 250u X 250u • Both dynamic power and delays are reduced • 10%-15% dynamic power reduction – Per optimized layer • 2% - 5% delay reduction March 2010 ISPD 2010 8
MIN_DLYPWR Problem : Question Is there a setting of widths and spaces such δ that delay reduction from base is D at least, while δ power increase from base is P at most? MIN_DLYPWR is NPC by polynomial reduction of PARTITION, which attemps to answer whether for a ( ) + ∈ ∀ ∈ given set whose elements have size B s b , b B , ∑ ∑ ( ) ( ) = there's a subset satisfying s b s b . ∈ ′ ∈ − ′ b B b B B March 2010 ISPD 2010 9
MIN_MAX_DLYPWR Problem Question : Is there a setting of widths and spaces such δ that power decrease from base power is at least P while δ maximal delay is increasing by D at most? MIN_MAX_PWRDLY is NPC by polynomial reduction of SUBSET_SUM which answers whether for whose B ( ) + + ∈ ∈ elements have size s b , and diven a number N , ∑ ( ) ′ ⊆ = there is B B satisfying s b N . ∈ ′ b B March 2010 ISPD 2010 10
Dynamic Programming Solution ( ) ( ) + + D 1, j D j 1, n ( ) = Delay is additive: D 1, n { } ( ) ( ) + max D 1, j , D j 1, n ( ) ( ) ( ) = + + Power is additive P 1, n P 1, j P j 1, n ( ) ∑ ∑ ( ) = − j + j Area is additive A j n , A w s = = i i i 0 i 0 Minimization of power and delay from j+1 to n is independent of power and delay from 1 to j. This suggests dynamic programming. Algorithm generates only essential (P,D) pairs in progression from wire to wire. March 2010 ISPD 2010 11
Power-Delay Solution Space D D=D 0 P max P min P Dynamic Programming finds the red curve progressively. Optimal solution is derived from solution space of last wire. March 2010 ISPD 2010 12
State Definition for Dynamic Programming area left for n-(j+1) wires accumulated delay ( ) ( ) ( ) ( ) ( ) + + + A j 1, n , s , D A j 1, n , s , P A j 1, n , s j j j rightmost allocated space accumulated power March 2010 ISPD 2010 13
State Dominancy and Redundancy ( ) ω ′ ′ ′ ′ ′ : w s , ,..., w , s Allocation 0 0 j j ( ) ω ′′ ′′ ′′ ′′ ′′ : w s , ,..., w s , if is dominating allocation 0 0 j j ( ) ( ) ∑ ∑ ∑ ∑ ′ ′ ′ ′ ′ ′ − j + j ≥ − j + j A s w A s w = i = i = i = i i 0 i 0 i 0 i 0 ′ ′ ′ ≥ s s and j j ( ) ( ) ( ) ( ) ′ ′′ ′ ′′ ω ≤ ω ∧ ω ≤ ω D D P P . March 2010 ISPD 2010 14
Stage Progression and State Augmentation Λ Λ stage stage j + j 1 wire j wire j+1 Width and space allocations of next wire March 2010 ISPD 2010 15
Λ Theorem (optimality): Stage of the DP algorithm contains all n the feasible non-redundant, and hence optimal, power-delay pairs that can be obtained by any width and space allocation to n wires ( ) f P D , Theorem: Any power-delay function monotonically increasing in and achieves minimum on the boundary of the P D power-delay feasible region. March 2010 ISPD 2010 16
Modeling Real Layout u 0 u n+1 Use transitive reduction of wire visibility graph Design rules are transitively closed Process wires from left to right in topological order with appropriate enhancement to power-delay calculations March 2010 ISPD 2010 17
Time and Storage Time complexity: Storage complexity: ( ) 3 log α β + ε ( ) O pq n n β ε 3 O q n α β and are max in-degree and out-degree, respectively, of wire adjacency graph vertex Time and storage in practice are manageable due to power grid which decomposes the problem into many independent smaller problems. March 2010 ISPD 2010 18
Further Research Directions • Filling aware optimization – Dynamic programming can generate filling patterns! – Line-to-line capacitance can be measured on the spot • Current algorithm works on P&R style only – Enhancement for full-custom design style – Cross-hierarchy dynamic programming • Is “bang-bang” sizing possible? – Using two values only is tremendous for litho! • Simultaneous cell and interconnect resizing – Use cell families with same footprint March 2010 ISPD 2010 19
Thank You! March 2010 ISPD 2010 20
Backup March 2010 ISPD 2010 21
NP Completeness Proof of Delay Sum It is NP since any substitution of valid guess into delay and power equations can be checked for YES or NO answer. Reduction of PARTITION into MIN_DLYPWR: 1. A wire is allocated for every element of PARTITION. 2. Resistance of drivers and wires are set to 0 and 1, resp., hence wire resistance is not affecting delays. March 2010 ISPD 2010 22
Coeficients in delay and power equations are set to zero or one, except load and activity, yielding: = = D C w and P F w . b b b b b b 3. Only one spacing is allowed, hence not affecting the problem. { } 4. Only two width values are allowed W W , . 1 2 5. Bus area is set sufficiently large, hence not affecting the problem. March 2010 ISPD 2010 23
( ) ( ) − 6. Activity factors are set to s b W W . 2 1 ( ) ( ) − Capacitive loads are set s b WW W W . 1 2 2 1 ∑ ( ) ( ) = − sum Delay turns into: 1 ( ) , D w s b WW W W b 1 2 2 1 ∈ b B ∑ ( ) = − Power turns into: P w s b ( ) W W . b 2 1 ∈ b B 7. Bounds of power increase and delay reduction = ∑ ( ) δ = δ are set to P D s b 2. ∈ b B Transformation consumes polynomial time. March 2010 ISPD 2010 24
( ) Let the answer to f I of MIN_DLYPWR be YES. ( ) Instance f I is set such that increases and P D decreases in . There's single and where: w P D ∑ ∑ ∑ ( ) δ = δ = D P s b 2. ′ ′ ∈ ∈ ∈ b b b B b B b B ( ) ∑ ( ) ∑ = δ = We obtained s b 2 D ′ ∈ ∈ b b B b B 1 1 WW ∑ ∑ ( ) ( ) − = 1 2 s b s b , ′ − ′ ∈ ∈ b B b B W W W W 1 2 2 1 ( ) ′ ′ − implying that B B , B solves PARTITION. March 2010 ISPD 2010 25
′ ⊆ Conversely, let B B be a YES answer to PARTITION. ′ ∈ Set width of w , b B , to W , and rest wires stay W . b 2 1 ( ) ( ) δ = − = D reduction is D C 1 W 1 W s b , and increase P b b 1 2 ( ) ( ) δ = − = is P F W W s b , thus yielding a YES answer to 2 1 b b the MIN_DLYPWR problem. Q.E.D March 2010 ISPD 2010 26
Recommend
More recommend