Analysis and Optimization of Global Interconnects Sachin Sapatnekar - PowerPoint PPT Presentation

Analysis and Optimization of Global Interconnects Sachin Sapatnekar ECE Department University of Minnesota Minneapolis, MN, USA sachin@umn.edu

2 Prashant Saxena, Synopsys Many slides borrowed from Jiang Hu, Texas A&M Acknowledgements Chuck Alpert, IBM • • •

Outline of the talk • Interconnect delay metrics • Interconnects and scaling theory • Synthesis of signal interconnects • Noise and congestion issues 3

4 Simple delay metrics

Interconnect modeling • Precise model requires transmission line analysis dx • Break up wire into segments Each segment can be modeled as • π -model L-model T-model R(+sL) R(+sL) R/2(+sL/2) R/2(+sL/2) C C/2 C/2 C • Other issues (crosstalk etc.) modeled using coupling caps • Interconnect extraction – Most precise with a 3-D field solver (takes a long time!) – Other faster approximate techniques useful for design analysis/optimization (R per square, C per unit area, 2.5-D models) 5

Gate delay models • Traditionally: assume that the gate drives a capacitor – Build macromodels for individual gates • Delay = f(widths, transition times, loads) • Example: K-factor equations • Similar idea used in standard cell characterization: Delay = f (transition times, load) – Table lookup models: storage/accuracy tradeoff (e.g. .lib format) – Fast circuit simulation – used in many delay calculators More recently: effective capacitances, current source/voltage • source models 6

RC delay calculations • Delays can be calculated easily • For example: RC driven by a step excitation R V(t) C Response V(t) = ( 1 - e -t/RC ) Time constant = RC Time constants for more complicated circuits? 7

Elmore delay for an RC tree ∑ ∑ = T R C D , k i j ∈ ∈ i Path ( k ) j downstream ( i ) Rd Cd Rb Cb Re Ra Root Ca Ce Rc Cc – Elmore Delay to node e = Ra.(Ca+Cb+Cc+Cd+Ce) + Rb.(Cb+Cd + Ce) + Re.Ce 8

9 2 C C 2 2 R C + ) Incrementally calculating the Elmore delay 2 C + R 2 1 C C 1 ( 1 R B = ) C R 1 − A ( Delay A

Model order reduction methods e(t) • Elmore delay: RC transfer function t e’(t) H(s) ≈ a 0 t b 0 + b 1 s t d • Can approximate RC circuit transfer function as a 0 + a 1 s + ... + a n-1 s n-1 b 0 + b 1 s + ... + b n-1 s n-1 + b n s n – Response approximated as a sum of exponentials – Useful for interconnect simulation – Other variants: PVL, PRIMA, etc. – Handles linear systems, but drivers may be nonlinear 10

Effective capacitance model • Includes the effects of gate nonlinearities • Gate driving RC interconnect x x – Determine waveform at gate output; analyze interconnect as a linear system after that • Possible model for waveform at x R – Gate driving total capacitance of net? C 1 C 2 • Gives erroneous results due to resistive shielding – Actual effective capacitance < total wiring capacitance – Techniques exist for determining C effective , or modeling the gate using a voltage/current source 11

Match charge 12 To get C new Compute Thevenin model at C eff C new C eff Computing C eff : Overall flow C eff =C new C eff =C new ? No delay,slew C new =C tot Compute yes [C. Kashyap]

Current source model • Represents the transistor I-V curve as a function of input slew and output load • Linear Thevenin driver delay = f( slew ,C load ) rd ± V out • CCSM (Synopsys), ECSM (Cadence) I out = f( slew ,C load ) [Amin, DAC06] 13

Wire tapering and layer assignment • Elmore delay ∑ ∑ = T R C D , k i j ∈ ∈ ( ) ( ) i Path k j downstream i Root – Wires near the root must have low resistances – Wires near the leaves must have low capacitances – Wider wires near root, narrower near leaves • In practice: # of wire widths limited to two or three • Same principle applies to layer assignment

Simple buffer insertion problem Given: Source and sink locations, sink capacitances and RATs, a buffer type, source delay rules, unit wire resistance and capacitance RAT 4 Buffer RAT 3 s 0 RAT 2 RAT 1 15

Simple buffer insertion problem Find: Buffer locations and a routing tree such that slack at the source is minimized = − q ( s ) min { RAT ( s ) delay ( s , s )} ≤ ≤ 0 1 i 4 i 0 i RAT 4 RAT 3 s 0 RAT 2 RAT 1 16

17 delay = 400 delay = 600 delay = 350 delay = 300 RAT = 500 RAT = 400 RAT = 500 RAT = 400 slack = + 100 slack = -200 Slack example

Interconnects and Scaling Theory

A scaling primer G G • Ideal process scaling: S S D D – Device geometries shrink by σ ( = 0.7x) • Device delay shrinks by σ w S h – Wire geometries shrink by σ ρ l /( w σ . h σ ) = R/ σ 2 • Resistance : ε ( h σ ) l /( S σ ) = same • Coupling cap : l • Capacitance to ground : similar • In each process generation h σ R doubles, C and Cc unchanged l σ • But it doesn’t quite work that way • h scales by less than σ to control R S σ w σ

Block scaling • Block area often stays same – # cells, # nets doubles • Wiring histogram shape (almost) invariant – Global interconnect lengths don’t shrink – Local interconnect lengths shrink by σ

A typical chip cross-section • Wires become “fatter” as you move to upper layers • From one technology to the next, wire aspect ratios become more skewed [Intel] • R is controlled, at the expense of coupling capacitance 21

The role of interconnects • Short interconnect – Used to connect nearby cells, R driver >> R interconnect – Minimize wire C, i.e., use short minwidth wires • Medium to long-distance (“global”) interconnect – R driver ≈ R interconnect – Size wires to tradeoff area vs. delay – Increasing width ⇒ Capacitance increases, Resistance decreases Need to find acceptable tradeoff - wire sizing problem • “Fat” wires – Thicker cross-sections in higher metal layers – Useful for reducing delays for global wires – Inductance issues, sharing of limited resource

Interconnect delay scaling • Delay of a wire of length l : τ int = (rl)(cl) = rcl 2 (first order) • Local interconnects : τ int : (r/ σ 2 )(c)(l σ ) 2 = rcl 2 – Local interconnect delay unchanged (but devices get faster) • Global interconnects : τ int : (r/ σ 2 )(c)(l) 2 = (rcl 2) / σ 2 – Global interconnect delay doubles – unsustainable! – Problem somewhat mitigated using buffers, using nonideal scaling as outlined earlier • Interconnect delay increasingly more dominant

ITRS projections Feature size (nm) Relative 250 180 130 90 65 45 32 delay 100 IT RS IL D Roadmap E volution Gate delay (fanout 4) Local interconnect (M1,2) 5 Global interconnect with repeaters Global interconnect without repeaters 4 10 ffe c tive k Industry Ac tua l T re nd 3 E 1 1997 IT RS 2 1999 IT RS 2003 IT RS 1 Source: I TRS, 2003 Source: I TRS, 2003 0.25 0.18 0.13 0.09 .065 .045 0 1 2 3 4 5 6 7 0.1 e c hnolog y Node ( µ m) T Source: Chia Hong Jan, IEDM 2003 Interconnect Short Course ITRS projections often a “best case scenario” projection

25 A buffer effectively isolates the downstream capacitance Vs Buffer insertion Consider • •

Optimizing medium/long interconnects • Delays of interconnects may become very large • Wire sizing helps to control the delay • Repeater insertion is another effective technique • Effects of a buffer – Isolates load capacitances of different “stages” – Adds a delay Subtree cap. Subtree cap. C L1 C L2 C buf Downstream capacitance here is C L1 + C buf (C L2 is isolated by the buffer) R driver Subtree cap. Subtree cap. C L1 C L2 C buf 26

Buffered global interconnects: Intuition l Interconnect delay = r.c.l 2 l 1 l 2 l 3 l n 2 < r.c.l 2 (where l = Σ l j ) Now, interconnect delay = Σ r.c.l i since Σ (l j 2 ) < ( Σ l j ) 2 (Of course, account for intrinsic buffer delay also)

More precise analysis: Optimal inter-buffer length • First order (lumped parasitic, Elmore delay) analysis L … … C g R d R d – On resistance of inverter C g – Gate input capacitance l r, c – Resistance, cap. per micron • Assume N identical buffers with equal inter-buffer length l ( L = Nl ) [ ] ( ) ( ) = + + + T N R C cl rl C cl d g g ( ) ( ) ⎥ ⎡ ⎤ 1 = + + + L rcl rC R c R C ⎢ g d d g ⎣ ⎦ l • For minimum delay, ⎡ ⎤ R C R C dT = − = d g opt = ⎢ ⎥ d g 0 L rc 0 l 2 ⎢ ⎥ dl l ⎣ ⎦ rc opt

Optimal interconnect delay • Substituting l opt back into the interconnect delay expression: ⎡ ⎤ ( ) ( ) 1 = + + + ⎢ ⎥ T L rcl rC R c R C opt opt g d d g ⎢ ⎥ l ⎣ ⎦ opt [ ] ( ) = + + T L 2 R C rc rC R c opt d g g d Delay grows linearly with L (instead of quadratically) R C opt = d g l rc Buffer-to-buffer spacing reduces in successive technology nodes d σ Dumb shrink d Smart shrink

Analysis and Optimization of Global Interconnects Sachin Sapatnekar - PowerPoint PPT Presentation

Analysis and Optimization of Global Interconnects Sachin Sapatnekar ECE Department University of Minnesota Minneapolis, MN, USA sachin@umn.edu 2 Prashant Saxena, Synopsys Many slides borrowed from Jiang Hu, Texas A&M Acknowledgements

1/5/2012 Overview of Interconnects Presentation Outline Myrinet and Quadrics General

Retiming & Pipelining over Global Retiming & Pipelining over Global Interconnects

Coupled Thermal-Electrical Transient Analysis of 3D Fuses and Interconnects Self Heating Effects

Overview Overview Processors Interconnects A few machines Examine the Top242 2 1

Optical Interconnects for Cloud Computing Data Centers: Recent Advances and Future Challenges Dr.

Interconnects Outline Interconnect scaling issues Aluminum technology Copper

Global Optimization Global constant propagation Liveness analysis 2 Local

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Regular Fabrics for Retiming & Regular Fabrics for Retiming & Pipelining over Global

GLOBAL RISKS GLOBAL RISKS GLOBAL RISKS - GLOBAL RISKS - - - GLOBAL RISKS GLOBAL RISKS

Global Optimization Lecture Outline Global flow analysis Global constant propagation

Introduction to Global Optimization Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen

New Approaches to Harness Global Interconnects Jason Cong Computer Science Department

New Approaches to Harness Global Interconnects Jason Cong Computer Science Department

Convex Optimization 4. Convex Optimization Problems Prof. Ying Cui Department of Electrical

P2P Combinatorial Optimization Amir H. Payberah (amir@sics.se) P2P Combinatorial Optimization, 13

NICTA workshop, 29-31 May 2003 Sydney Australia, based on SEHAS, Portland OR 9 and 10 May 2003,

1 !(.3) "# $

Augustus: a CCN router for programmable networks ACM ICN 2016, Kyoto Davide Kirchner 1 ,

ReductionandRealization Techniques inModelling of Passive ElectronicStructures

Advanced Multidisciplinary System Engineering or How I learned to think outside of MY box!

Overview eat - History 1 eat: An R Package for Automation of Data Preparation The Institute for

GPU Computing: Development and Analysis Part 1 Anton Wijs Muhammad Osama Marieke Huisman

Numerical methods for FCI B. Despr es LJLL-Paris Part II: Hydrodynamics VI+CEA Thanks to

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

Analysis and Optimization of Global Interconnects Sachin Sapatnekar - PowerPoint PPT Presentation

Analysis and Optimization of Global Interconnects Sachin Sapatnekar ECE Department University of Minnesota Minneapolis, MN, USA sachin@umn.edu 2 Prashant Saxena, Synopsys Many slides borrowed from Jiang Hu, Texas A&M Acknowledgements

1/5/2012 Overview of Interconnects Presentation Outline Myrinet and Quadrics General

Retiming &amp; Pipelining over Global Retiming &amp; Pipelining over Global Interconnects

Coupled Thermal-Electrical Transient Analysis of 3D Fuses and Interconnects Self Heating Effects

Overview Overview Processors Interconnects A few machines Examine the Top242 2 1

Optical Interconnects for Cloud Computing Data Centers: Recent Advances and Future Challenges Dr.

Interconnects Outline Interconnect scaling issues Aluminum technology Copper

Global Optimization Global constant propagation Liveness analysis 2 Local

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Regular Fabrics for Retiming &amp; Regular Fabrics for Retiming &amp; Pipelining over Global

GLOBAL RISKS GLOBAL RISKS GLOBAL RISKS - GLOBAL RISKS - - - GLOBAL RISKS GLOBAL RISKS

Global Optimization Lecture Outline Global flow analysis Global constant propagation

Introduction to Global Optimization Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen

New Approaches to Harness Global Interconnects Jason Cong Computer Science Department

New Approaches to Harness Global Interconnects Jason Cong Computer Science Department

Convex Optimization 4. Convex Optimization Problems Prof. Ying Cui Department of Electrical

P2P Combinatorial Optimization Amir H. Payberah (amir@sics.se) P2P Combinatorial Optimization, 13

NICTA workshop, 29-31 May 2003 Sydney Australia, based on SEHAS, Portland OR 9 and 10 May 2003,

1 !(.3) &quot;# $

Augustus: a CCN router for programmable networks ACM ICN 2016, Kyoto Davide Kirchner 1 ,

ReductionandRealization Techniques inModelling of Passive ElectronicStructures

Advanced Multidisciplinary System Engineering or How I learned to think outside of MY box!

Overview eat - History 1 eat: An R Package for Automation of Data Preparation The Institute for

GPU Computing: Development and Analysis Part 1 Anton Wijs Muhammad Osama Marieke Huisman

Numerical methods for FCI B. Despr es LJLL-Paris Part II: Hydrodynamics VI+CEA Thanks to

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

Retiming & Pipelining over Global Retiming & Pipelining over Global Interconnects

Regular Fabrics for Retiming & Regular Fabrics for Retiming & Pipelining over Global

1 !(.3) "# $