ultra fast interconnect driven cell cloning for
play

Ultra-Fast Interconnect Driven Cell Cloning For Minimizing Critical - PowerPoint PPT Presentation

Ultra-Fast Interconnect Driven Cell Cloning For Minimizing Critical Path Delay Zhuo Li 1 , David A. Papa 2,1 , Charles J. Alpert 1 , Shiyan Hu 3 , Weiping Shi 4 , C. N. Sze 1 and Ying Zhou 1 IBM Austin Research Lab 1 Dept. EECS, University of


  1. Ultra-Fast Interconnect Driven Cell Cloning For Minimizing Critical Path Delay Zhuo Li 1 , David A. Papa 2,1 , Charles J. Alpert 1 , Shiyan Hu 3 , Weiping Shi 4 , C. N. Sze 1 and Ying Zhou 1 IBM Austin Research Lab 1 Dept. EECS, University of Michigan 2 Dept. ECE, Michigan Technological University 3 Dept. ECE, Texas A&M University 4

  2. Best Value Toys Global Routability placement analysis / recovery Cell Buffering movement Vt assignment Layer assignment Gate Sizing $15K Cloning? Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 2

  3. Best Value Toys Global Routability placement analysis / recovery Cell Buffering movement Vt assignment Layer assignment Gate Sizing Cloning Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 3

  4. Cloning in Logic Synthesis S 1 S 1 S 21 S 2 S 35 S 3 S 47 S 4 S 60 S 5 S 2 S 6 S 17 S 7 S 25 S 8 Hwang, ICCAD 1992; S 71 S 9 J. Lillis, ISCAS 1996; A. Srivastava, TCAD 2001; S 80 … … • Reduce net cut and total capacitance load (NP- hard) S 100 • Ignore physical information (interconnect, Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 4 buffering, …)

  5. Interconnect Driven Cloning 1 1 1 1 P F 1 P S 1 F 1 S 1 Slack: 1 P’ Slack: 1 3 3 3 2.5 F 2 F 2 S 2 S 2 Slack: -1 Slack: -0.5 AT(D 1 ) = AT(D 2 ) = 0 RAT(S 1 ) = RAT(S 2 ) = 5 Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 5

  6. Interconnect Driven Cloning 1 1 F 1 P S 1 F 1 P S 1 Slack: 1 Slack: 1 3 3 3 3 F 2 F 2 S 2 P’ S 2 Slack: -1 Slack: 1 AT(D 1 ) = AT(D 2 ) = 0 RAT(S 1 ) = RAT(S 2 ) = 5 Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 6

  7. Our Contribution Find the “optimal” partitioning and placement of the original and duplicated gates  Assuming linear-buffer-delay model  O(n) algorithm when original gate is fixed  O(nlogn) algorithm when original gate is movable  Just focus on worst slack  For interconnect delay dominant sub-circuit  Extensions Back of envelop filter  Logic based cloning: High fan-outs/capacitive load  Physical based cloning: special fan-out location distributions Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 7

  8. Cloning Problem A sub-circuit Two-pin timing arcs D = σ · dis(G 1 , G 2 ) Clone P to P’, find the partitioning of S and locations of P and P’ , to maximize sub-circuit slack S (Fan-outs) P F (Fan-ins) Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 8

  9. Cloning Problem A sub-circuit Two-pin timing arcs D = σ · dis(G 1 , G 2 ) Clone P to P’, find the partitioning of S and locations of P and P’ , to maximize sub-circuit slack S (Fan-outs) P P’ F (Fan-ins) Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 9

  10. Cloning Problem Reduce to a gate placement problem when the partitioning is given (RUMBLE ISPD08 and Pyramids ICCAD08) Perform real buffering after cloning Fanouts O Fanins Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 10

  11. Arrival Time Arc Each fan-in gate has an arrival time AT(F i ) For each physical point v, AT(v) = max(AT(F i ) + τ . Dis(AT(F i ) , v) The set of points minimizing AT(v) is arri rrival t im e arc rc K(F) K(F) is either an Manhattan arc or a single point Similar to Deferred Merge Embedding (DME) K(F) is also the bottom of a trough AT(v) (overlapping of a set of reverse pyramids) AT = 1 AT = 1 AT = 5 K(F) K(F) AT = 5 AT = 3 K(F) K(F) Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 11

  12. Best Region and Best Arrival Time Arc K(S) : required arrival time arc (maximizing RAT(v) ) Best est reg egion Z : every point inside this region has maximum sub-circuit slack (constructed with K(F) and K(S) ) Best Arri rrival al Tim e arc arc B is the intersection of Best Region and Arrival Time Arc Define K(F i ) as the arrival time arc for F 1 , …, F i , O(n) time to compute K(F) , K(S), Z and B. Also O(n) time to compute all K(F i ) and K(S i ) , instead of O(n 2 ) time. B K(S) K(S) K(F) Z Z B B K(S) K(F) K(F) Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 12

  13. Case 1: P is movable No matter what the partitioning is, one can place P and P’ on best arrival time arc, while still achieving the best slack Divide the whole plane into 6 regions based on slack cuves slack slack H 3 H 4 H 5 H 1 H 3 j i j i j slack slack H 6 H 2 H 4 H 2 H 4 K(F) i j i j i H 1 H 2 H 5 slack slack H 5 H 6 i j i j Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 13

  14. Case 1: P is movable (Cont.) If no gates in H 6 , O(n) time algorithm slack slack H 3 H 4 H 5 H 1 H 3 j i j i j slack slack H 6 H 2 H 4 H 2 H 4 K(F) i j i j i H 1 H 2 H 5 slack slack H 5 H 6 i j i j Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 14

  15. Case 1: P is movable (Cont.) If there are gates in H 6 , treat all slack curves as 3-segment trapezoid-like curves O(n) time algorithm i j Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 15

  16. Case 1: P is movable (Cont.) If there are gates in H 6 , treat all slack curves as 3-segment trapezoid-like curves O(n) time algorithm P P’ i j Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 16

  17. Case 2: P is fixed P may not be on K(F) Slack P (i) = RAT(i) – τ . Dis(P, i) – AT(P) At most O(n) partitions since there are only n possible worst slack values for any partitioning Sort S i accordingly Let P drive the set of fan-outs { S 1 } , { S 1 , S 2 } , { S 1 , S 2 , S 3 } , … O(nlogn) time algorithm Slack P Dis(P ,i) Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 17

  18. One Example Original circuit After buffering S 1 S 1 -1.2 F 1 F 1 P P F 2 F2 -0.6 S 2 S 2 Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 18

  19. One Example RUMBLE P is fixed S 1 S 1 -0.8 0 Buffer F 1 P’ F 1 P P F 2 F 2 -0.5 -0.8 S 2 S 2 Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 19

  20. One Example P is movable Bad wirelength solution S 1 S 1 0 0 F 1 F 1 P’ P’ F 2 P F 2 0 -0.5 S 2 P S 2 Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 20

  21. Experimental Results 100 random 65 nm sub-circuits  P is fixed: 279 ps better than pure buffering, 87 ps better than RUBMLE on average  P is movable: 309 ps better than pure buffering, 117 ps better than RUMBLE on average 65 nm # objs Single transform Compare to a flow with Area macros pure buffering Increase Slack FOM Slack FOM Imprv. Imprv. Imprv. Imprv. Macro 1 91k 0.480 ns 438 0.097 ns -8 0.5% Macro 2 231k 0.098 ns 0 0.081 ns 200 0.8% Macro 3 191k 0.383 ns 2837 0.124 ns 280 1% Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 21

  22. K(S) Extensions Z Duplicate more than B blockage two gates K(F)  O(n 2 ) algorithm Be smart about Z regions  Latches  Blockages  Wire-length  FOM extension Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 22

  23. Vironoi Diagram Partitioning Best slack for every sink with blockages If we know the locations of P and P’  The optimal partitioning is the Voronoi diagram between two points or a point and a diamond in Manhattan space  Only O(n 3 ) possible partitionings  Try all partitionings and find the best one 5 1 A P P 5 A 5 P’ P’ 5 B B Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 23

  24. Optimal Timing-Driven Cloning - ISPD 2010 3/18/2010 24

Recommend


More recommend