Regular Fabrics for Retiming & Regular Fabrics for Retiming & Pipelining over Global Interconnects Pipelining over Global Interconnects Jason Cong Jason Cong Computer Science Department Computer Science Department University of California, Los Angeles University of California, Los Angeles cong@cs cs. .ucla ucla. .edu edu cong@ http://cadlab cadlab. .cs cs. .ucla ucla. .edu edu/~cong /~cong http:// FCRP Interconnect Workshop, June 28, 2002 FCRP Interconnect Workshop, June 28, 2002 DUSD(Labs)
Overarching GSRC Research Emphasis Overarching GSRC Research Emphasis [Jan Rabaey Rabaey, June 2002] , June 2002] [Jan A broadened focus on application-oriented embedded systems under tight cost, PDA, and time-to-market constraints Founded on One Basic Principle “From Ad- -Hoc System Hoc System- -on on- -a a- -Chip Design Chip Design “From Ad to Disciplined, Platform- -Based Design” Based Design” to Disciplined, Platform
The Discipline of Platform- -Based Design Based Design The Discipline of Platform Application Application Programming Model: Kernels/Benchmarks Models/Estimators Architecture(s) Architecture(s) Architectural Platform Architectural Platform Microarchitecture(s) Microarchitecture(s) Functional Blocks, Cycle-speed, power, area Interconnect V S G S V S V V S S G G S S V S V S Circuit Fabric(s) Circuit Fabric(s) S SV G Silicon Implementation Platform V Silicon Implementation Platform S Manfacturing Interface Manfacturing Interface Delay, variation, Basic device & interconnect SPICE models structures Silicon Implementation Silicon Implementation
The Discipline of Platform- -Based Design Based Design The Discipline of Platform Application Application Comm Comp and Comm Based Design Based Design Programming Model: Programmable Systems Programmable Systems Kernels/Benchmarks Calibrating Achievable Design Test, Verification, Energy&Power Calibrating Achievable Design Comp and Test, Verification, Energy&Power Models/Estimators Architecture(s) Architecture(s) Architectural Platform Architectural Platform Microarchitecture(s) Microarchitecture(s) Constructive Fabrics Constructive Fabrics Functional Blocks, Cycle-speed, power, area Interconnect Circuit Fabric(s) Circuit Fabric(s) Silicon Implementation Platform Silicon Implementation Platform Manfacturing Interface Manfacturing Interface Delay, variation, Basic device & interconnect SPICE models structures Silicon Implementation Silicon Implementation
From Architecture to Silicon Implementation Platform From Architecture to Silicon Implementation Platform Different targets employ different intermediate platforms, hence � Different targets employ different intermediate platforms, hence � different layers of regularity and design regularity and design- -space constraints space constraints different layers of Design space may actually be smaller smaller than with large steps! than with large steps! � Design space may actually be � � Large Large- -step predictions/abstractions may misguide the optimizations step predictions/abstractions may misguide the optimizations � Architecture Logic Regularity Component Regularity and Reuse Regular Fabrics Geometrical Regularity Silicon Implementation Constructive Fabrics Th [Source: Larry Pileggi]
Sample Work from the GSRC Fabric Theme Sample Work from the GSRC Fabric Theme � Bob Bob Brayton Brayton: Topologically Constrained Logic Synthesis : Topologically Constrained Logic Synthesis � � Malgorzata Marek Malgorzata Marek- -Sadowska Sadowska: Interconnecting Regular Fabrics : Interconnecting Regular Fabrics � � Wojtek Maly Wojtek Maly: Geometrical Regularity : Geometrical Regularity � � Herman Herman Schmit Schmit: Regular Communication Fabrics : Regular Communication Fabrics � � Jason Cong Jason Cong: : Regular Fabrics for Retiming and Pipelining over Regular Fabrics for Retiming and Pipelining over � Global Interconnects Global Interconnects
Motivation: How Far Can We Go in Each Clock Cycle Motivation: How Far Can We Go in Each Clock Cycle 7 clock � NTRS’97 0.07um Tech � 5 G Hz across-chip clock 6 clock � 620 mm 2 (24.9mm x 24.9mm) � IPEM BIWS estimations � Buffer size: 100x � Driver/receiver size: 100x 5 clock � From corner to corner: � 7 clock cycles 4 clock 3 clock 1 clock 2 clock 15.04 22.56 24.9 (mm) 0 7.52
Solutions Solutions Fully asynchronous designs � Fully asynchronous designs � GALS (global asynchronous locally synchronous designs) � GALS (global asynchronous locally synchronous designs) � � Latency Latency- -insensitive designs insensitive designs � Synchronous designs, with multi- -cycle communications cycle communications � Synchronous designs, with multi � � Much better understood Much better understood � � Supported by the current tool set Supported by the current tool set � � More energy efficient ? More energy efficient ? �
Need of Considering Retiming during Placement Need of Considering Retiming during Placement - Retiming/pipelining on global interconnects - Retiming/pipelining on global interconnects � Multiple clock cycles are needed to cross the chip Multiple clock cycles are needed to cross the chip � � Proper placement allows retiming to Proper placement allows retiming to hide hide global interconnect delays. global interconnect delays. � Placement 1 Placement 2 b c d a c d a b d(v)=1, WL=6, d(e) ∝ WL d(v)=1, WL=6, d(e) ∝ WL Before retiming, φ = 4.0 Before retiming, φ = 5.0 Better Initial Placement !! After retiming, φ = 3.0
Need of Considering Retiming during Placement Need of Considering Retiming during Placement - Retiming/pipelining on global interconnects - Retiming/pipelining on global interconnects � Multiple clock cycles are needed to cross the chip Multiple clock cycles are needed to cross the chip � � Proper placement allows retiming to Proper placement allows retiming to hide hide global interconnect delays. global interconnect delays. � Placement 1 Placement 2 b c d a c d a b d(v)=1, WL=6, d(e) ∝ WL d(v)=1, WL=6, d(e) ∝ WL Before retiming, φ = 4.0 Before retiming, φ = 5.0 Better Initial Placement !! After retiming, φ = 3.0 After retiming, φ = 4.0
Difficulties Difficulties � How to consider retiming/pipelining over global How to consider retiming/pipelining over global � interconnects interconnects � Flip Flip- -flop boundaries are not fixed during placement, difficult to do flop boundaries are not fixed during placement, difficult to do � static timing analysis static timing analysis Use of the concepts of c-retiming and sequential timing analysis (Seq-TA) � How to handle the high complexity of the combined problem How to handle the high complexity of the combined problem � Use the multi-level optimization technique
Static Timing Analysis (STA) Static Timing Analysis (STA) Sequential circuit example: PI: a, b. PO: g. a d c e g b f Suppose d(v)=1, d(e)=2 a a b g f c d e d AT: 1 1 3 3 3 6 9 c e g Suppose clock cycle φ =11 RT: 9 9 11 9 3 6 9 a f Transform the circuit into a DAG for static timing analysis Topological order: a,b,g,f,c,d,e Compute arrival time (AT) and required time (RT) of each node are computed in linear time.
Continuous Retiming (c- -retiming) and retiming) and Continuous Retiming (c Sequential Arrival Time (SAT) Sequential Arrival Time (SAT) � Definition [Pan et al, TCAD98] Definition [Pan et al, TCAD98] � Given a clock period φ , transfer circuit transfer circuit C C into an edge into an edge- -weighted vertex weighted weighted vertex weighted Given a clock period φ , � � graph G, G, graph � Label vertex v as l Label vertex v as l ( ( v v ) = the weight of longest path from PIs to v = max{ ) = the weight of longest path from PIs to v = max{ l l ( ( u u ) ) - - φ φ · · � w ( ( u,v u,v ) + ) + d d ( ( u,v u,v ) + ) + d d ( ( v v )}, )}, l l ( ( v v ) is also called ) is also called SAT(v). SAT(v). w ≤ φ (POs) ≤ Theorem: C Theorem: C can be retimed to can be retimed to φ φ + max{ + max{ d d ( ( v v )} iff )} iff l l (POs) φ � � ) = l φ - Relation to retiming: r r ( ( v v ) = l ( ( v v ) / ) / φ - 1 1 Relation to retiming: � � Complexity is O(VE) Complexity is O(VE) � � w l (a,c)= d(e (a,c) ) - φ φ · · w w ( ( a,c a,c ) ) a d(a) w ( ( a,c a,c )=1 )=1 w a l ( a ) = 7 d(c) c d ( a )= d (b) = 1, d ( a,c ) = d ( b,c )= 2, φ = 5 c l ( c ) = max{7+2-5·1+1, 3+2+1} = 6 b l ( b ) = 3 w ( w ( b.c b.c )=0 )=0 b d(b) w l (b,c)= d(e (b,c) ) - φ φ · · w w (b (b ,c ,c ) )
Continuous Retiming (c- -retiming) and retiming) and Continuous Retiming (c Sequential Arrival Time (SAT) Sequential Arrival Time (SAT) Retiming graph (not a DAG) Sequential circuit a a 2 d d -2.5 c e g c e g 2 2 -2.5 -2.5 -7 b -2.5 b -2.5 f f d(v)=1, d(e)=2 Is φ = 4.5 possible ? Iter# a b c d e f g Retimed circuit 0 0 0 - ∞ - ∞ - ∞ - ∞ - ∞ a 1 0 0 -1.5 - ∞ - ∞ - ∞ - ∞ d 2 0 0 -1.5 1.5 1.5 - ∞ - ∞ c e g 3 0 0 -1.5 1.5 4.5 0 0 4 0 0 -1.5 1.5 4.5 0 0 b 5 0 0 -1.5 1.5 4.5 0 0 f Cycle time 4.5 is possible because l (g) ≤ 4.5
Recommend
More recommend