Ultra-Fast Interconnect Driven Cell Cloning For Minimizing Critical Path Delay Zhuo Li 1 , David A. Papa 2,1 , Charles J. Alpert 1 , Shiyan Hu 3 , Weiping Shi 4 , C. N. Sze 1 and Ying Zhou 1 IBM Austin Research Lab 1 Dept. EECS, University of Michigan 2 Dept. ECE, Michigan Technological University 3 Dept. ECE, Texas A&M University 4
Best Value Toys Global Routability placement analysis / recovery Cell Buffering movement Vt assignment Layer assignment Gate Sizing $15K Cloning? Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 2
Best Value Toys Global Routability placement analysis / recovery Cell Buffering movement Vt assignment Layer assignment Gate Sizing Cloning Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 3
Cloning in Logic Synthesis S 1 S 1 S 21 S 2 S 35 S 3 S 47 S 4 S 60 S 5 S 2 S 6 S 17 S 7 S 25 S 8 Hwang, ICCAD 1992; S 71 S 9 J. Lillis, ISCAS 1996; A. Srivastava, TCAD 2001; S 80 … … • Reduce net cut and total capacitance load (NP- hard) S 100 • Ignore physical information (interconnect, Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 4 buffering, …)
Interconnect Driven Cloning 1 1 1 1 P F 1 P S 1 F 1 S 1 Slack: 1 P’ Slack: 1 3 3 3 2.5 F 2 F 2 S 2 S 2 Slack: -1 Slack: -0.5 AT(D 1 ) = AT(D 2 ) = 0 RAT(S 1 ) = RAT(S 2 ) = 5 Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 5
Interconnect Driven Cloning 1 1 F 1 P S 1 F 1 P S 1 Slack: 1 Slack: 1 3 3 3 3 F 2 F 2 S 2 P’ S 2 Slack: -1 Slack: 1 AT(D 1 ) = AT(D 2 ) = 0 RAT(S 1 ) = RAT(S 2 ) = 5 Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 6
Our Contribution Find the “optimal” partitioning and placement of the original and duplicated gates Assuming linear-buffer-delay model O(n) algorithm when original gate is fixed O(nlogn) algorithm when original gate is movable Just focus on worst slack For interconnect delay dominant sub-circuit Extensions Back of envelop filter Logic based cloning: High fan-outs/capacitive load Physical based cloning: special fan-out location distributions Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 7
Cloning Problem A sub-circuit Two-pin timing arcs D = σ · dis(G 1 , G 2 ) Clone P to P’, find the partitioning of S and locations of P and P’ , to maximize sub-circuit slack S (Fan-outs) P F (Fan-ins) Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 8
Cloning Problem A sub-circuit Two-pin timing arcs D = σ · dis(G 1 , G 2 ) Clone P to P’, find the partitioning of S and locations of P and P’ , to maximize sub-circuit slack S (Fan-outs) P P’ F (Fan-ins) Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 9
Cloning Problem Reduce to a gate placement problem when the partitioning is given (RUMBLE ISPD08 and Pyramids ICCAD08) Perform real buffering after cloning Fanouts O Fanins Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 10
Arrival Time Arc Each fan-in gate has an arrival time AT(F i ) For each physical point v, AT(v) = max(AT(F i ) + τ . Dis(AT(F i ) , v) The set of points minimizing AT(v) is arri rrival t im e arc rc K(F) K(F) is either an Manhattan arc or a single point Similar to Deferred Merge Embedding (DME) K(F) is also the bottom of a trough AT(v) (overlapping of a set of reverse pyramids) AT = 1 AT = 1 AT = 5 K(F) K(F) AT = 5 AT = 3 K(F) K(F) Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 11
Best Region and Best Arrival Time Arc K(S) : required arrival time arc (maximizing RAT(v) ) Best est reg egion Z : every point inside this region has maximum sub-circuit slack (constructed with K(F) and K(S) ) Best Arri rrival al Tim e arc arc B is the intersection of Best Region and Arrival Time Arc Define K(F i ) as the arrival time arc for F 1 , …, F i , O(n) time to compute K(F) , K(S), Z and B. Also O(n) time to compute all K(F i ) and K(S i ) , instead of O(n 2 ) time. B K(S) K(S) K(F) Z Z B B K(S) K(F) K(F) Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 12
Case 1: P is movable No matter what the partitioning is, one can place P and P’ on best arrival time arc, while still achieving the best slack Divide the whole plane into 6 regions based on slack cuves slack slack H 3 H 4 H 5 H 1 H 3 j i j i j slack slack H 6 H 2 H 4 H 2 H 4 K(F) i j i j i H 1 H 2 H 5 slack slack H 5 H 6 i j i j Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 13
Case 1: P is movable (Cont.) If no gates in H 6 , O(n) time algorithm slack slack H 3 H 4 H 5 H 1 H 3 j i j i j slack slack H 6 H 2 H 4 H 2 H 4 K(F) i j i j i H 1 H 2 H 5 slack slack H 5 H 6 i j i j Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 14
Case 1: P is movable (Cont.) If there are gates in H 6 , treat all slack curves as 3-segment trapezoid-like curves O(n) time algorithm i j Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 15
Case 1: P is movable (Cont.) If there are gates in H 6 , treat all slack curves as 3-segment trapezoid-like curves O(n) time algorithm P P’ i j Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 16
Case 2: P is fixed P may not be on K(F) Slack P (i) = RAT(i) – τ . Dis(P, i) – AT(P) At most O(n) partitions since there are only n possible worst slack values for any partitioning Sort S i accordingly Let P drive the set of fan-outs { S 1 } , { S 1 , S 2 } , { S 1 , S 2 , S 3 } , … O(nlogn) time algorithm Slack P Dis(P ,i) Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 17
One Example Original circuit After buffering S 1 S 1 -1.2 F 1 F 1 P P F 2 F2 -0.6 S 2 S 2 Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 18
One Example RUMBLE P is fixed S 1 S 1 -0.8 0 Buffer F 1 P’ F 1 P P F 2 F 2 -0.5 -0.8 S 2 S 2 Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 19
One Example P is movable Bad wirelength solution S 1 S 1 0 0 F 1 F 1 P’ P’ F 2 P F 2 0 -0.5 S 2 P S 2 Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 20
Experimental Results 100 random 65 nm sub-circuits P is fixed: 279 ps better than pure buffering, 87 ps better than RUBMLE on average P is movable: 309 ps better than pure buffering, 117 ps better than RUMBLE on average 65 nm # objs Single transform Compare to a flow with Area macros pure buffering Increase Slack FOM Slack FOM Imprv. Imprv. Imprv. Imprv. Macro 1 91k 0.480 ns 438 0.097 ns -8 0.5% Macro 2 231k 0.098 ns 0 0.081 ns 200 0.8% Macro 3 191k 0.383 ns 2837 0.124 ns 280 1% Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 21
K(S) Extensions Z Duplicate more than B blockage two gates K(F) O(n 2 ) algorithm Be smart about Z regions Latches Blockages Wire-length FOM extension Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 22
Vironoi Diagram Partitioning Best slack for every sink with blockages If we know the locations of P and P’ The optimal partitioning is the Voronoi diagram between two points or a point and a diamond in Manhattan space Only O(n 3 ) possible partitionings Try all partitionings and find the best one 5 1 A P P 5 A 5 P’ P’ 5 B B Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 23
Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 24
Recommend
More recommend