Analysis and Optimization of Global Interconnects Sachin Sapatnekar ECE Department University of Minnesota Minneapolis, MN, USA sachin@umn.edu
2 Prashant Saxena, Synopsys Many slides borrowed from Jiang Hu, Texas A&M Acknowledgements Chuck Alpert, IBM • • •
Outline of the talk • Interconnect delay metrics • Interconnects and scaling theory • Synthesis of signal interconnects • Noise and congestion issues 3
4 Simple delay metrics
Interconnect modeling • Precise model requires transmission line analysis dx • Break up wire into segments Each segment can be modeled as • π -model L-model T-model R(+sL) R(+sL) R/2(+sL/2) R/2(+sL/2) C C/2 C/2 C • Other issues (crosstalk etc.) modeled using coupling caps • Interconnect extraction – Most precise with a 3-D field solver (takes a long time!) – Other faster approximate techniques useful for design analysis/optimization (R per square, C per unit area, 2.5-D models) 5
Gate delay models • Traditionally: assume that the gate drives a capacitor – Build macromodels for individual gates • Delay = f(widths, transition times, loads) • Example: K-factor equations • Similar idea used in standard cell characterization: Delay = f (transition times, load) – Table lookup models: storage/accuracy tradeoff (e.g. .lib format) – Fast circuit simulation – used in many delay calculators More recently: effective capacitances, current source/voltage • source models 6
RC delay calculations • Delays can be calculated easily • For example: RC driven by a step excitation R V(t) C Response V(t) = ( 1 - e -t/RC ) Time constant = RC Time constants for more complicated circuits? 7
Elmore delay for an RC tree ∑ ∑ = T R C D , k i j ∈ ∈ i Path ( k ) j downstream ( i ) Rd Cd Rb Cb Re Ra Root Ca Ce Rc Cc – Elmore Delay to node e = Ra.(Ca+Cb+Cc+Cd+Ce) + Rb.(Cb+Cd + Ce) + Re.Ce 8
9 2 C C 2 2 R C + ) Incrementally calculating the Elmore delay 2 C + R 2 1 C C 1 ( 1 R B = ) C R 1 − A ( Delay A
Model order reduction methods e(t) • Elmore delay: RC transfer function t e’(t) H(s) ≈ a 0 t b 0 + b 1 s t d • Can approximate RC circuit transfer function as a 0 + a 1 s + ... + a n-1 s n-1 b 0 + b 1 s + ... + b n-1 s n-1 + b n s n – Response approximated as a sum of exponentials – Useful for interconnect simulation – Other variants: PVL, PRIMA, etc. – Handles linear systems, but drivers may be nonlinear 10
Effective capacitance model • Includes the effects of gate nonlinearities • Gate driving RC interconnect x x – Determine waveform at gate output; analyze interconnect as a linear system after that • Possible model for waveform at x R – Gate driving total capacitance of net? C 1 C 2 • Gives erroneous results due to resistive shielding – Actual effective capacitance < total wiring capacitance – Techniques exist for determining C effective , or modeling the gate using a voltage/current source 11
Match charge 12 To get C new Compute Thevenin model at C eff C new C eff Computing C eff : Overall flow C eff =C new C eff =C new ? No delay,slew C new =C tot Compute yes [C. Kashyap]
Current source model • Represents the transistor I-V curve as a function of input slew and output load • Linear Thevenin driver delay = f( slew ,C load ) rd ± V out • CCSM (Synopsys), ECSM (Cadence) I out = f( slew ,C load ) [Amin, DAC06] 13
Wire tapering and layer assignment • Elmore delay ∑ ∑ = T R C D , k i j ∈ ∈ ( ) ( ) i Path k j downstream i Root – Wires near the root must have low resistances – Wires near the leaves must have low capacitances – Wider wires near root, narrower near leaves • In practice: # of wire widths limited to two or three • Same principle applies to layer assignment
Simple buffer insertion problem Given: Source and sink locations, sink capacitances and RATs, a buffer type, source delay rules, unit wire resistance and capacitance RAT 4 Buffer RAT 3 s 0 RAT 2 RAT 1 15
Simple buffer insertion problem Find: Buffer locations and a routing tree such that slack at the source is minimized = − q ( s ) min { RAT ( s ) delay ( s , s )} ≤ ≤ 0 1 i 4 i 0 i RAT 4 RAT 3 s 0 RAT 2 RAT 1 16
17 delay = 400 delay = 600 delay = 350 delay = 300 RAT = 500 RAT = 400 RAT = 500 RAT = 400 slack = + 100 slack = -200 Slack example
Interconnects and Scaling Theory
A scaling primer G G • Ideal process scaling: S S D D – Device geometries shrink by σ ( = 0.7x) • Device delay shrinks by σ w S h – Wire geometries shrink by σ ρ l /( w σ . h σ ) = R/ σ 2 • Resistance : ε ( h σ ) l /( S σ ) = same • Coupling cap : l • Capacitance to ground : similar • In each process generation h σ R doubles, C and Cc unchanged l σ • But it doesn’t quite work that way • h scales by less than σ to control R S σ w σ
Block scaling • Block area often stays same – # cells, # nets doubles • Wiring histogram shape (almost) invariant – Global interconnect lengths don’t shrink – Local interconnect lengths shrink by σ
A typical chip cross-section • Wires become “fatter” as you move to upper layers • From one technology to the next, wire aspect ratios become more skewed [Intel] • R is controlled, at the expense of coupling capacitance 21
The role of interconnects • Short interconnect – Used to connect nearby cells, R driver >> R interconnect – Minimize wire C, i.e., use short minwidth wires • Medium to long-distance (“global”) interconnect – R driver ≈ R interconnect – Size wires to tradeoff area vs. delay – Increasing width ⇒ Capacitance increases, Resistance decreases Need to find acceptable tradeoff - wire sizing problem • “Fat” wires – Thicker cross-sections in higher metal layers – Useful for reducing delays for global wires – Inductance issues, sharing of limited resource
Interconnect delay scaling • Delay of a wire of length l : τ int = (rl)(cl) = rcl 2 (first order) • Local interconnects : τ int : (r/ σ 2 )(c)(l σ ) 2 = rcl 2 – Local interconnect delay unchanged (but devices get faster) • Global interconnects : τ int : (r/ σ 2 )(c)(l) 2 = (rcl 2) / σ 2 – Global interconnect delay doubles – unsustainable! – Problem somewhat mitigated using buffers, using nonideal scaling as outlined earlier • Interconnect delay increasingly more dominant
ITRS projections Feature size (nm) Relative 250 180 130 90 65 45 32 delay 100 IT RS IL D Roadmap E volution Gate delay (fanout 4) Local interconnect (M1,2) 5 Global interconnect with repeaters Global interconnect without repeaters 4 10 ffe c tive k Industry Ac tua l T re nd 3 E 1 1997 IT RS 2 1999 IT RS 2003 IT RS 1 Source: I TRS, 2003 Source: I TRS, 2003 0.25 0.18 0.13 0.09 .065 .045 0 1 2 3 4 5 6 7 0.1 e c hnolog y Node ( µ m) T Source: Chia Hong Jan, IEDM 2003 Interconnect Short Course ITRS projections often a “best case scenario” projection
25 A buffer effectively isolates the downstream capacitance Vs Buffer insertion Consider • •
Optimizing medium/long interconnects • Delays of interconnects may become very large • Wire sizing helps to control the delay • Repeater insertion is another effective technique • Effects of a buffer – Isolates load capacitances of different “stages” – Adds a delay Subtree cap. Subtree cap. C L1 C L2 C buf Downstream capacitance here is C L1 + C buf (C L2 is isolated by the buffer) R driver Subtree cap. Subtree cap. C L1 C L2 C buf 26
Buffered global interconnects: Intuition l Interconnect delay = r.c.l 2 l 1 l 2 l 3 l n 2 < r.c.l 2 (where l = Σ l j ) Now, interconnect delay = Σ r.c.l i since Σ (l j 2 ) < ( Σ l j ) 2 (Of course, account for intrinsic buffer delay also)
More precise analysis: Optimal inter-buffer length • First order (lumped parasitic, Elmore delay) analysis L … … C g R d R d – On resistance of inverter C g – Gate input capacitance l r, c – Resistance, cap. per micron • Assume N identical buffers with equal inter-buffer length l ( L = Nl ) [ ] ( ) ( ) = + + + T N R C cl rl C cl d g g ( ) ( ) ⎥ ⎡ ⎤ 1 = + + + L rcl rC R c R C ⎢ g d d g ⎣ ⎦ l • For minimum delay, ⎡ ⎤ R C R C dT = − = d g opt = ⎢ ⎥ d g 0 L rc 0 l 2 ⎢ ⎥ dl l ⎣ ⎦ rc opt
Optimal interconnect delay • Substituting l opt back into the interconnect delay expression: ⎡ ⎤ ( ) ( ) 1 = + + + ⎢ ⎥ T L rcl rC R c R C opt opt g d d g ⎢ ⎥ l ⎣ ⎦ opt [ ] ( ) = + + T L 2 R C rc rC R c opt d g g d Delay grows linearly with L (instead of quadratically) R C opt = d g l rc Buffer-to-buffer spacing reduces in successive technology nodes d σ Dumb shrink d Smart shrink
Recommend
More recommend