clock skew scheduling
play

Clock Skew Scheduling A Fast and Effective Approach Ankur Sharma, - PowerPoint PPT Presentation

Lagrangian Relaxation Based Gate Sizing with Clock Skew Scheduling A Fast and Effective Approach Ankur Sharma, David Chinnery Mentor, a Siemens Business Chris Chu Iowa State University, Computer Engineering Outline Motivation


  1. Lagrangian Relaxation Based Gate Sizing with Clock Skew Scheduling – A Fast and Effective Approach Ankur Sharma, David Chinnery Mentor, a Siemens Business Chris Chu Iowa State University, Computer Engineering

  2. Outline ◼ Motivation – Previous work – Contribution ◼ Problem statement ◼ Previous approach ◼ Our proposed approach ◼ Experimental results ◼ Conclusion 2

  3. Motivation ◼ Gate sizing is a key circuit optimization technique — Can trade off area, delay, and power — Delay-constrained leakage power minimization ◼ Skewing the clock arrival allows time borrowing between sequential stages. This is known as useful skew . ◼ Timing borrowing can be used for: — Increasing performance or satisfying delay constraints — Timing slack to reduce area or power 3

  4. Simultaneous Gate Sizing with Skew Scheduling Clock Period, T = 20 Flip Flip Flip Delay = 16 Delay = 24 Q D flop flop D flop D Q Q A B C a clk,B = 4,24,44,… a clk,A = 0,20,40,… a clk,C = 0,20,40,… skew B = 4 skew A = 0 skew C = 0 ◼ Signal is required to travel within one clock cycle ◼ Clock skew alters the required and arrival times 4

  5. Previous Work ◼ [Chuang’95] formulated the primal problem as a linear program. — Piece-wise linear approximation of convex delays ◼ [Roy’07] formulated a Lagrangian dual problem (LDP). Solved the Lagrangian sub-problem simultaneously over size and skew. — Assumed continuous sizes and convex delays ◼ [Wang’09] transformed the primal problem to eliminate skew variables. Formulated an LDP and maximized the dual. — Used network flow solver to update Lagrange multipliers — Optimal for continuous sizes and convex delays ◼ [ Shklover’12] formulated an LDP with discrete sizes and skews. — Focus on clock tree optimization via dynamic programming 5

  6. Our Contributions ◼ Integration of clock skew scheduler inside LR gate sizer ( EGSS ). — Our LR formulation preserves the acyclic structure of the timing graph. — Modify Lagrange multiplier update to account for skew — A new strategy for solving the Lagrangian sub-problem with skew variables ◼ For comparison, we extended the dual maximization strategy from [Wang’09] to apply to discrete sizes and non-convex delay ( NetFlow ). ◼ We identify and empirically demonstrate several limitations of realizing primal optimality via dual maximization. 6 [Wang’09] J . Wang, D. Das, and H. Zhou. Gate sizing by Lagrangian relaxation revisited. IEEE TCAD 28(7):1071 – 1084, 2009.

  7. Primal Problem Formulation Minimize total leakage power minimize 𝑞 𝒚, 𝒙 𝒚,𝒃,𝒙 subject to 𝑏 𝑗 + 𝑒 𝑗𝑘 𝒚 ≤ 𝑏 𝑘 , ∀ 𝑗, 𝑘 ∈ 𝐹 Timing constraints 𝑏 𝑒 𝑙 ≤ 𝑈 − 𝑡𝑓𝑢𝑣𝑞 𝑙 + 𝑥 𝑙 , ∀𝑙 ∈ 𝐺𝐺 𝑥 𝑙 + 𝑒 𝑑𝑚𝑙,𝑟 𝑙 ≤ 𝑏 𝑟 𝑙 , ∀𝑙 ∈ 𝐺𝐺 Skew bounds 𝑥 𝑛𝑗𝑜 ≤ 𝑥 𝑙 ≤ 𝑥 𝑛𝑏𝑦 , ∀𝑙 ∈ 𝐺𝐺 T : target clock period x : cell sizes a i : arrival time at node i ( i , j ) : timing arc from node i to node j E : set of all timing arcs d ij : delay of timing arc from node i to node j w k : skew at flip-flop k FF : set of flip-flops 7

  8. Timing Graph ◼ Graphical representation of timing constraints Timing graph Timing constraints Circuit j i a j a i 𝑏 𝑗 + 𝑒 𝑗𝑘 𝒚 ≤ 𝑏 𝑘 𝑒 𝑗𝑘 𝑏 𝑒 𝑙 + 𝑡𝑓𝑢𝑣𝑞 𝑙 − 𝑥 𝑙 ≤ 𝑈 flip-flop k 𝑏 𝑒 𝑙 𝑏 𝑟 𝑙 𝑥 𝑙 + 𝑒 𝑑𝑚𝑙,𝑟 𝑙 ≤ 𝑏 𝑟 𝑙 D k Q k 𝑡𝑓𝑢𝑣𝑞 𝑙 𝑒 𝑑𝑚𝑙,𝑟 𝑙 Clk k 𝑥 𝑙 −𝑥 𝑙 Clock node 𝑏 𝐽 = 0 𝑏 𝑃 = 𝑈 Dummy nodes 8

  9. NetFlow – Skew Elimination ◼ Due to [Wang’09]. We refer to it as NetFlow . 𝑒 𝑗𝑘 𝑏 𝑒 𝑙 + 𝑡𝑓𝑢𝑣𝑞 𝑙 − 𝑥 𝑙 ≤ 𝑈 𝑥 𝑙 + 𝑒 𝑑𝑚𝑙,𝑟 𝑙 ≤ 𝑏 𝑟 𝑙 𝑏 𝑒 𝑙 𝑏 𝑟 𝑙 𝑥 𝑛𝑗𝑜 ≤ 𝑥 𝑙 ≤ 𝑥 𝑛𝑏𝑦 𝑡𝑓𝑢𝑣𝑞 𝑙 𝑒 𝑑𝑚𝑙,𝑟 𝑙 −𝑥 𝑙 𝑥 𝑙 O and I are dummy nodes. 𝑏 𝐽 = 0 𝑏 𝑃 = 𝑈 𝑏 𝑒 𝑙 + 𝑡𝑓𝑢𝑣𝑞 𝑙 − 𝑈 ≤ 𝑥 𝑙 ≤ 𝑏 𝑟 𝑙 − 𝑒 𝑑𝑚𝑙,𝑟 𝑙 𝑒 𝑗𝑘 𝑥 𝑛𝑗𝑜 ≤ 𝑥 𝑙 ≤ 𝑥 𝑛𝑏𝑦 No skews, but there are loops in the timing graph. 𝑏 𝑒 𝑙 𝑏 𝑟 𝑙 𝑒 𝑑𝑚𝑙,𝑟 𝑙 𝑡𝑓𝑢𝑣𝑞 𝑙 𝑏 𝑒 𝑙 + 𝑡𝑓𝑢𝑣𝑞 𝑙 − 𝑈 ≤ 𝑥 𝑛𝑏𝑦 −𝑈 −𝑥 𝑛𝑏𝑦 𝑥 𝑛𝑗𝑜 ≤ 𝑏 𝑟 𝑙 − 𝑒 𝑑𝑚𝑙,𝑟 𝑙 𝑥 𝑛𝑗𝑜 𝑏 𝑒 𝑙 + 𝑡𝑓𝑢𝑣𝑞 𝑙 − 𝑈 ≤ 𝑏 𝑟 𝑙 − 𝑒 𝑑𝑚𝑙,𝑟 𝑙 𝑏 𝐽 = 0 𝑏 𝑃 = 𝑈 New arc 9 [Wang’09] J . Wang, D. Das, and H. Zhou. Gate sizing by Lagrangian relaxation revisited. IEEE TCAD 28(7):1071 – 1084, 2009.

  10. NetFlow – Lagrangian Relaxation Formulation 𝑒 𝑗𝑘 Primal problem: 𝑏 𝑒 𝑙 𝑏 𝑟 𝑙 minimize 𝑞 𝒚 𝑒 𝑑𝑚𝑙,𝑟 𝑙 𝒚,𝒃 𝑡𝑓𝑢𝑣𝑞 𝑙 −𝑈 subject to −𝑥 𝑛𝑏𝑦 𝑥 𝑛𝑗𝑜 𝑏 𝑗 + 𝑒 𝑗𝑘 𝒚 ≤ 𝑏 𝑘 , ∀ 𝑗, 𝑘 ∈ 𝐹 𝑏 𝑒 𝑙 + 𝑡𝑓𝑢𝑣𝑞 𝑙 − 𝑈 ≤ 𝑥 𝑛𝑏𝑦 , ∀𝑙 ∈ 𝐺𝐺 𝑏 𝐽 = 0 𝑏 𝑃 = 𝑈 𝑥 𝑛𝑗𝑜 ≤ 𝑏 𝑟 𝑙 − 𝑒 𝑑𝑚𝑙,𝑟 𝑙 , ∀𝑙 ∈ 𝐺𝐺 Lagrangian relaxation sub-problem (LRS λ ) 𝑏 𝑒 𝑙 + 𝑡𝑓𝑢𝑣𝑞 𝑙 − 𝑈 ≤ 𝑏 𝑟 𝑙 − 𝑒 𝑑𝑚𝑙,𝑟 𝑙 , ∀𝑙 ∈ 𝐺𝐺 𝑕 𝝁 = min 𝑀 𝝁 (𝒚) 𝑦 𝑕 ∈ 𝑌 𝑕 , ∀𝑕 ∈ 𝐻 𝒚 Lagrangian dual problem (LDP): Lagrangian function: maximize 𝑕 𝝁 𝝁≥𝟏 𝑀 𝝁 𝒚 = 𝑞 𝒚 + ෍ 𝜇 𝑗𝑘 × 𝑑𝑝𝑡𝑢 𝑗𝑘 (𝒚) subject to 𝑗,𝑘 ∈𝐹 𝝁 ∈ Ω = 𝝁 σ 𝑗|(𝑗,𝑣)∈𝐹 𝜇 𝑗𝑣 = σ 𝑗|(𝑣,𝑗)∈𝐹 𝜇 𝑣𝑗 , ∀𝑣 ∈ 𝑂 cost ij is the cost of arc ( i , j ) , i.e. d ij , setup k , etc. flow conservation λ ij is the Lagrange multiplier for timing arc ( i , j ) . where N is the set of all nodes in the timing graph. Network flow solver to update λ . 10 [Wang’09] J . Wang, D. Das, and H. Zhou. Gate sizing by Lagrangian relaxation revisited. IEEE TCAD 28(7):1071 – 1084, 2009.

  11. NetFlow – Dual Maximization Lagrangian dual problem (LDP): LRS λ : maximize 𝑕 𝝁 𝝁≥𝟏 𝑕 𝝁 = min 𝑞 𝒚 + ෍ 𝜇 𝑗𝑘 × 𝑑𝑝𝑡𝑢 𝑗𝑘 (𝒚) subject to 𝒚 flow conservation constraints on 𝝁 𝑗,𝑘 ∈𝐹 Iteratively, ◼ Update 𝝁 , for given 𝒚 subject to flow constraints — Formulated as a min-cost network flow problem. Run time expensive ◼ Update 𝒚 , for given 𝝁 — Heuristically solve LRS – a discrete combinatorial optimization problem. Focus is dual maximization rather than primal feasibility 11 [Wang’09] J . Wang, D. Das, and H. Zhou. Gate sizing by Lagrangian relaxation revisited. IEEE TCAD 28(7):1071 – 1084, 2009.

  12. NetFlow – Visualizing Dual Maximization For a single gate circuit: 𝑞(𝑦) Update 𝝁 rotates line 𝑀 𝝁 𝑦 = 𝑞 𝑦 + 𝜇 × 𝑒 𝑦 − 𝑈 around 𝑦 1 Slope: −𝜇 2 Equation of line on p ( x ) vs. d ( x ) – T plane: ◼ The slope is − 𝝁 . ◼ L 𝝁 ( x ) is the intercept on the p ( x ) axis 0 𝑒 𝑦 1 > 𝑈 𝑌 𝑞(𝑦) Constraint violation ⇒ Increase 𝝁 𝑞(𝑦) 𝑞 𝑦 = −𝜇 1 × 𝑒 𝑦 − 𝑈 + 𝑀 𝜇 1 (𝑦) 𝜇 ∗ = 𝜇 3 𝜇 2 𝒉 𝝁 ∗ = 𝒒 ∗ 𝑞 ∗ To solve LRS λ , push 𝜇 1 𝑦 1 line as low as 𝑕(𝜇 1 ) possible while x ∈ X 𝑞 𝑛𝑗𝑜 𝜇 0 = 0 0 Primal feasible 𝑒(𝑦) − 𝑈 𝑕 0 = 𝑞 𝑛𝑗𝑜 12 𝑒(𝑦) − 𝑈

  13. NetFlow: Dual Maximization Limitations with Discrete Sizes ◼ Duality gap: Dual optimum may not 𝜇 = 𝜇 4 be equal to primal optimum, g * < p * 𝑦 5 𝑞(𝑦) 𝜇 = 𝜇 3 ◼ Primal feasibility: At dual optimum, Dual optimal, 𝑕 ∗ 𝑦 4 𝑦 ∗ multiple sizing solutions are possible & 𝑞 ∗ 𝜇 = 𝜇 2 some don’t satisfy timing constraints. — The dual optimal 𝑕(𝜇 3 ) is realized at 𝑦 3 as 𝑦 3 𝑦 2 well as 𝑦 4 , but only 𝑦 4 is primal feasible. 𝑦 1 𝜇 = 0 𝑞 𝑛𝑗𝑜 ◼ Dual optimality is not guaranteed, 𝑒(𝑦) − 𝑈 as LRS solver is no longer optimal Each dot denotes a distinct sizing solution. 13

  14. NetFlow: Dual Maximization Limitations with Discrete Sizes ◼ Three profiles are shown: — Primal cost (blue dash) — Dual cost (blue dash-dot) — Total negative slack (TNS) (red solid) ◼ Dual cost is less than primal cost. — Gap is roughly 20% wide; may partly be due to the duality gap. ◼ TNS does not converge to zero. — Oscillations prevent convergence ◼ Due to discreteness and non- convexity, dual maximization does not guarantee primal feasibility 14

  15. Effective Gate Sizer and Skew Scheduler (EGSS) ◼ Seamlessly integrates with state-of-the-art discrete LR gate sizer ◼ Re-use LRS solver from discrete LR gate sizer — Focus on primal feasibility rather than exact computation of dual function — Extend the LRS solver to iteratively size gates and schedule skews ◼ Explicitly update skews rather than deducing them implicitly ◼ Modify and apply projection based Lagrange multiplier update — Compared to min-cost flow solver based multiplier update – Linear runtime complexity, more than a order of magnitude faster – Much better convergence — Requires the timing graph to be loop-free 15

Recommend


More recommend