Accurate Clock Mesh Sizing via Sequential Quadratic Programming Venkata Rajesh Mekala, Yifang Liu, XiaojiYe, Jiang Hu, Peng Li Department of ECE Texas A&M University 1 ISPD 2010 3/18/2010
OUTLINE Introduction Previous Works Problem Formulation Algorithm Overview Results Conclusions 2 ISPD 2010 3/18/2010
Clock Architectures Clock source Clock Mesh • excellent for low skew, jitter • high power, area, capacitance Flip-flops • difficult to analyze • clock gating not easy flip flops Clock Tree • used in modern processors • low cost (wiring, power, cap) • higher skew, jitter than mesh • widely used in ASIC designs • clock gating easy to incorporate Clock source Flip flops crosslink crosslink tree Local trees Hybrid: tree + cross-links Flip flops • low cost (wiring, power, cap) Hybrid: mesh + local trees • smaller skew, jitter than tree* 3 ISPD 2010 3/18/2010 • difficult to analyze
Clock Mesh Clock mesh architecture is very effective in reducing skew variation. Clock mesh is difficult in analyzing with sufficient accuracy. It dissipates higher power compared to other architectures. The challenge is to design the mesh with less power meeting the skew constraints. 4 ISPD 2010 3/18/2010
Clock Distribution Networks Clock Mesh Clock Trees Crosslinks Pullela, Menezes and Pileggi Rajaram, Hu and Mahapatra Desai, Cvijetic and Jensen Moment-sensitivity-based wire sizing Reducing clock skew variability via Sizing of clock distribution networks for skew reduction crosslinks for high performance CPU chips 1997 2006 1996 Wang, Ran, Jiang and Sadowska Samanta, Hu and Li Venkataraman, Feng, Hu and Li General skew constrained Discrete buffer and wire sizing for Combinatorial algorithms for fast clock network sizing based on link-based clock mesh optimization sequential linear programming non-tree clock networks 2006 2005 2008 Guthaus, Sylvester and Brown Rajaram and Pan Clock buffer and wire sizing using MeshWorks: An efficient framework sequential programming for planning, synthesis and 2006 optimization of clock mesh networks 2008 5 ISPD 2010 3/18/2010
Motivation & Our Contributions Current-source based gate modeling approach to speedup the accurate analysis of clock mesh. Efficient adjoint sensitivity analysis to provide desirable sensitivities. Algorithm based on rigorous SQP. First clock mesh sizing method that does systematic solution search and is based on accurate delay model 6 ISPD 2010 3/18/2010
Problem Formulation I is the set of interconnects in the clock mesh µ is the average value of x i ; i Є I is the width of the sink delays and δ is element i in the the given maximum variance interconnect set w i ; i Є I is the area of d j ; j Є S the propagation element i in the delay of the signal from interconnect set the root of the clock tree to sink j D is the coefficient vector reflecting the linear size-area relation S is the set of sinks or local trees L x and U x represent the lower bound and upper 7 Texas A&M University 3/18/2010 bound vectors of the wires
Problem Formulation total clock mesh area skew constraint in the variance form lower bound, upper bound vectors Higher wire area leads to a higher load capacitance for the clock buffers which in of the wire widths Constraint in the quadratic form is a differentiable function turn implies a higher power dissipation. 8 ISPD 2010 3/18/2010
Solving the Problem Lagrangian of the original problem: Gradient vector of the Lagrangian function is be obtained by circuit simulation and adjoint sensitivity analysis 9 ISPD 2010 3/18/2010
Solving the Problem Lagrangian of the original problem: Gradient vector of the Lagrangian function The adjoint sensitivity analysis gives us the values of 10 ISPD 2010 3/18/2010
Solving the Problem Lagrangian of the original problem: Gradient vector of the Lagrangian function The sensitivities with respect to wire widths are calculated with the help of chain rule: 11 ISPD 2010 3/18/2010
Solving the Problem Lagrangian of the original problem: Gradient vector of the Lagrangian function Necessary conditions for any optimal point of the problem – KKT conditions Common way to solve this equation is by Newton’s method. 12 ISPD 2010 3/18/2010
Solving the Problem Let the Newton step in iteration k of solving the equation be: x, λ are variables in the equation. p x,k and p λ , k are the vectors representing change in width of wires and Lagrangian multiplier. 13 ISPD 2010 3/18/2010
Solving the Problem Let the Newton step in iteration k of solving the equation be: Jacobian of the equation is: Hessian of the Lagrangian function: Newton step calculation implies that p x,k and p λ , k satisfy the following system: 14 ISPD 2010 3/18/2010
Solving the Problem Newton step calculation implies that p x,k and p λ , k satisfy the following system: Adjusting the above equation gives us: This equation is solved by: Minimize: Subject to: 15 ISPD 2010 3/18/2010
Solving the QP sub-problem The QP sub-problem to Minimize: be solved as a part of SQP is: Subject to: and 16 ISPD 2010 3/18/2010
Solving the QP sub-problem The QP sub-problem to Minimize: be solved as a part of SQP is: Subject to: and the sensitivities with respect to through sensitivity analysis we wire widths are calculated with the obtain the gradient. help of chain rule: 17 ISPD 2010 3/18/2010
Solving the QP sub-problem The QP sub-problem to Minimize: be solved as a part of SQP is: Subject to: and we use quasi-newton (BFGS ) method to approximate the hessian in each iteration 18 ISPD 2010 3/18/2010
Sensitivity Analysis Sensitivity information of the original circuit obtained by convolution-like computation between transient waveforms of the original and the adjoint circuit. Compact gate model provides up to two orders of magnitude speedup over SPICE simulation while maintaining the same level of accuracy. P. Li, Z. Feng and E. Acar. “Characterizing multistage nonlinear drivers and variability for accurate timing and noise analysis". In IEEE Trans. Very Large Scale Integration, pp 205 - 214, November 2007. X. Ye and P. Li. “An application-specic adjoint sensitivity analysis framework for clock mesh sensitivity computation". In Proc. of IEEE International Symposium on Quality Electronic Design, pp 634 - 640, 2009. 19 ISPD 2010 3/18/2010
CMSSQP Framework Initialization of the design SPICE (No. of buffers, benchmark and clock mesh) Generate spice netlist Sensitivity Analysis C++ (Sensitivities of the 𝜏 2 with respect to wire widths) Quasi-Newton approximation of Hessian Optimization MOSEK Formulate and Solve the Quadratic Programming sub-problem Update the widths of the clock mesh Transient Simulation SPICE (Compute the delays, slew to every sink node) Convergence STOP criterion met? NO YES 20 ISPD 2010 3/18/2010
Results Experimental Setup 65nm technology transistor models for the buffers (m rows X n columns) mesh Max skew Linux platform having two Intel Xeon E5410 quad-cores ISCAS, ISPD benchmarks Widths limited 21 ISPD 2010 3/18/2010
Initial clock mesh design 22 ISPD 2010 3/18/2010
Results after executing CMSSQP 23 ISPD 2010 3/18/2010
Summary: Reduction in area 24 ISPD 2010 3/18/2010
Area-skew tradeoff by varying δ ISPD: ispd09f11 25 ISPD 2010 3/18/2010
Case(a): ( σ 2 < δ ), σ 2 , total clock mesh area in each iteration 26 ISPD 2010 3/18/2010
Case(b): ( σ 2 > δ ), σ 2 , total clock mesh area in each iteration 27 ISPD 2010 3/18/2010
Conclusions & Future work Presented an algorithm for reduction of clock mesh area satisfying specified skew constraints in a clock mesh. Robust in dealing with any complex clock mesh network. First clock mesh sizing method that does systematic solution search and is based on accurate delay model. Experimental results achieved about 33% reduction in clock mesh area. Can be extended to size interconnects, mesh buffers simultaneously. 28 ISPD 2010 3/18/2010
Thanks 29 ISPD 2010 3/18/2010
Recommend
More recommend