Retiming & Pipelining over Global Retiming & Pipelining over Global Interconnects Interconnects Jason Cong Jason Cong Computer Science Department Computer Science Department University of California, Los Angeles University of California, Los Angeles cong@cs.ucla.edu edu cong@cs.ucla. http://cadlab.cs.ucla.edu/~cong http://cadlab.cs.ucla.edu/~cong Joint work with C. C. Chang, D. Pan*, and X. Yuan Joint work with C. C. Chang, D. Pan*, and X. Yuan * IBM Research * IBM Research
Motivation: How Far Can We Go in Each Clock Cycle Motivation: How Far Can We Go in Each Clock Cycle 7 clock � NTRS’97 0.07um Tech � 5 G Hz across-chip clock 6 clock � 620 mm 2 (24.9mm x 24.9mm) � IPEM BIWS estimations � Buffer size: 100x � Driver/receiver size: 100x 5 clock � From corner to corner: � 7 clock cycles 4 clock 3 clock 1 clock 2 clock 15.04 22.56 24.9 (mm) 0 7.52
Solutions Solutions Fully asynchronous designs � Fully asynchronous designs � GALS (global asynchronous locally synchronous designs) � GALS (global asynchronous locally synchronous designs) � � Latency Latency- -insensitive designs insensitive designs � Synchronous designs, with multi- -cycle communications cycle communications � Synchronous designs, with multi � � Much better understood Much better understood � � Supported by the current tool set Supported by the current tool set � � More energy efficient ? More energy efficient ? �
Interconnect- -Centric IC Design Flow Centric IC Design Flow Interconnect Under Development at UCLA Under Development at UCLA Architecture/Conceptual-level Design Design Specification Interconnect Planning Interconnect Performance Physical Hierarchy Generation for Multi-Cycle Comm. Estimation Models (IPEM) • Physical Hierarchy Generation for Multi-Cycle Comm. HDM • Interconnect Architecture Planning • OWS, SDWS, BISWS abstraction Structure view Synthesis and Placement under Physical Hierarchy Functional view Interconnect Optimization Physical view Interconnect Synthesis (TRIO) Timing view Topology genration & wiresizng for delay • Topology Optimization with Buffer Insertion • Wire sizing and spacing Wire ordering & spacing for noise control • Simultaneous Buffer Insertion and Wire Sizing • Simultaneous Topology Construction with Buffer Insertion and Wire Sizing Interconnect Layout Route Planning Point-to-Point Gridless Routing Final Layout
Physical Hierarchy Generation Physical Hierarchy Generation Physical Hierarchy Generation Problem Formulation Hierarchy Generation Problem Formulation Physical Physical Hierarchy = Placement bins + module locations Logical Hierarchy Hard IP Soft module Same color for modules of Assign modules to physical hierarchy the same logic hierarchy Defines global interconnects • Optimization objectives: • wire length minimization • routing congestion minimization • clock period, latency, performance (with consideration of multi-cycle comm.)
Need of Considering Retiming/Pipelining during Placement Need of Considering Retiming/Pipelining during Placement - Retiming/pipelining on global interconnects - Retiming/pipelining on global interconnects � Multiple clock cycles are needed to cross the chip Multiple clock cycles are needed to cross the chip � � Proper placement allows retiming to Proper placement allows retiming to hide hide global interconnect delays. global interconnect delays. � Placement 1 Placement 2 b c d a c d a b d(v)=1, WL=6, d(e) ∝ WL d(v)=1, WL=6, d(e) ∝ WL Before retiming, φ = 4.0 Before retiming, φ = 5.0 Better Initial Placement !! After retiming, φ = 3.0
Need of Considering Retiming during Placement Need of Considering Retiming during Placement - Retiming/pipelining on global interconnects - Retiming/pipelining on global interconnects � Multiple clock cycles are needed to cross the chip Multiple clock cycles are needed to cross the chip � � Proper placement allows retiming to Proper placement allows retiming to hide hide global interconnect delays. global interconnect delays. � Placement 1 Placement 2 b c d a c d a b d(v)=1, WL=6, d(e) ∝ WL d(v)=1, WL=6, d(e) ∝ WL Before retiming, φ = 4.0 Before retiming, φ = 5.0 Better Initial Placement !! After retiming, φ = 3.0 After retiming, φ = 4.0
Difficulties Difficulties � How to consider retiming/pipelining over global How to consider retiming/pipelining over global � interconnects interconnects � Flip Flip- -flop boundaries are not fixed during placement, difficult to do flop boundaries are not fixed during placement, difficult to do � static timing analysis static timing analysis Answer: Use of the concepts of c-retiming and sequential timing analysis (Seq-TA) � How to handle the high complexity of the combined problem How to handle the high complexity of the combined problem � Answer: Use the multi-level optimization technique
Simultaneous Coarse Placement with Retiming on Simultaneous Coarse Placement with Retiming on Interconnects Interconnects Our solution � Our solution � � Compute the labels of all nodes under c Compute the labels of all nodes under c- -retiming for a given retiming for a given � placement solution and perform sequential timing analysis (Seq Seq- - placement solution and perform sequential timing analysis ( TA) TA) � Minimize the longest sequential path by improving the placement Minimize the longest sequential path by improving the placement � solution solution Alternative solution [Brayton Brayton, et al] , et al] � Alternative solution [ � � Enforcing all loop constraints during placement Enforcing all loop constraints during placement �
Static Timing Analysis (STA) Static Timing Analysis (STA) Sequential circuit example: PI: a, b. PO: g. a d c e g b f Suppose d(v)=1, d(e)=2 a a b g f c d e d AT: 1 1 3 3 3 6 9 c e g Suppose clock cycle φ =11 RT: 9 9 11 9 3 6 9 a f Transform the circuit into a DAG for static timing analysis Topological order: a,b,g,f,c,d,e Compute arrival time (AT) and required time (RT) of each node are computed in linear time.
Continuous Retiming (c- -retiming) and retiming) and Continuous Retiming (c Sequential Arrival Time (SAT) Sequential Arrival Time (SAT) � Definition [Pan et al, TCAD98] Definition [Pan et al, TCAD98] � Given a clock period φ , transfer circuit transfer circuit C C into an edge into an edge- -weighted vertex weighted weighted vertex weighted Given a clock period φ , � � graph G, G, graph � Label vertex v as l Label vertex v as l ( ( v v ) = the weight of longest path from PIs to v = max{ ) = the weight of longest path from PIs to v = max{ l l ( ( u u ) ) - - φ φ · · � w ( ( u,v u,v ) + ) + d d ( ( u,v u,v ) + ) + d d ( ( v v )}, )}, l l ( ( v v ) is also called ) is also called SAT(v). SAT(v). w ≤ φ (POs) ≤ Theorem: C Theorem: C can be retimed to can be retimed to φ φ + max{ + max{ d d ( ( v v )} iff )} iff l l (POs) φ � � ) = l φ - Relation to retiming: r r ( ( v v ) = l ( ( v v ) / ) / φ - 1 1 Relation to retiming: � � Complexity is O(VE) Complexity is O(VE) � � w l (a,c)= d(e (a,c) ) - φ φ · · w w ( ( a,c a,c ) ) a d(a) w ( ( a,c a,c )=1 )=1 w a l ( a ) = 7 d(c) c d ( a )= d (b) = 1, d ( a,c ) = d ( b,c )= 2, φ = 5 c l ( c ) = max{7+2-5·1+1, 3+2+1} = 6 b l ( b ) = 3 w ( w ( b.c b.c )=0 )=0 b d(b) w l (b,c)= d(e (b,c) ) - φ φ · · w w (b (b ,c ,c ) )
Continuous Retiming (c- -retiming) and retiming) and Continuous Retiming (c Sequential Arrival Time (SAT) Sequential Arrival Time (SAT) Retiming graph (not a DAG) Sequential circuit a a 2 d d -2.5 c e g c e g 2 2 -2.5 -2.5 -7 b -2.5 b -2.5 f f d(v)=1, d(e)=2 Is φ = 4.5 possible ? Iter# a b c d e f g Retimed circuit 0 0 0 - ∞ - ∞ - ∞ - ∞ - ∞ a 1 0 0 -1.5 - ∞ - ∞ - ∞ - ∞ d 2 0 0 -1.5 1.5 1.5 - ∞ - ∞ c e g 3 0 0 -1.5 1.5 4.5 0 0 4 0 0 -1.5 1.5 4.5 0 0 b 5 0 0 -1.5 1.5 4.5 0 0 f Cycle time 4.5 is possible because l (g) ≤ 4.5
Continuous Retiming (c- -retiming) and retiming) and Continuous Retiming (c Sequential Arrival Time (SAT) (cont’d) Sequential Arrival Time (SAT) (cont’d) Retiming graph (not a DAG) Sequential circuit a a d d c e g c e g b b f f d(v)=1, d(e)=2 Is φ = 2.5 feasible ? Iter# a b c d e f g 0 0 0 - ∞ - ∞ - ∞ - ∞ - ∞ 1 0 0 0.5 - ∞ - ∞ - ∞ - ∞ 2 0 0 0.5 3.5 3.5 - ∞ - ∞ 3 0 0 0.5 3.5 6.5 4 4 Cycle time 2.5 is not feasible because l (g) > 2.5
Multi- -Level Optimization Framework Level Optimization Framework Multi Levels Uncoarsening & Coarsening Refinement (optimization) Problem sizes • Multi-level coarsening generates smaller problem sizes for top levels � faster optimization on top levels • May explore different aspects of the solution space at different levels • Gradual refinement on good solutions from coarser levels is very efficient • Successful in many applications • Originally developed for PDE • Recent success in VLSICAD: partitioning, placement, routing
Recommend
More recommend