integra integra fast multi bit flip flop clustering for
play

INTEGRA: INTEGRA: Fast Multi-Bit Flip-Flop Clustering for Clock - PowerPoint PPT Presentation

INTEGRA: INTEGRA: Fast Multi-Bit Flip-Flop Clustering for Clock Power Saving Based on g Interval Graphs I RIS H UI -R U J IANG C HIH -L ONG C HANG Y U M ING Y ANG Y U -M ING Y ANG NCTU NCTU E VAN Y U -W EN T SAI L ANCER S HENG -F ONG C HEN L


  1. INTEGRA: INTEGRA: Fast Multi-Bit Flip-Flop Clustering for Clock Power Saving Based on g Interval Graphs I RIS H UI -R U J IANG C HIH -L ONG C HANG Y U M ING Y ANG Y U -M ING Y ANG NCTU NCTU E VAN Y U -W EN T SAI L ANCER S HENG -F ONG C HEN L ANCER S HENG F ONG C HEN IRIS Lab Nat’l Chiao Tung Univ. / Faraday Tech Corp.

  2. Outline 2 Introduction Introduction Introduction Introduction Problem & properties Problem & properties Algorithm - INTEGRA Algorithm - INTEGRA Experimental results Experimental results Conclusion Conclusion INTEGRA - ISPD'11

  3. Clock Power Dominates! 3  Power has become one bottleneck for circuit implementation  Clock power is the major dynamic power source p j y p  The clock signal toggles in each cycle  High switching activity  Clock power model: dynamic power  P clk = C clk V dd f clk  P = C V 2 f  C clk : switching capacitance charged/discharged by clock D Q D Q … Comb ckt … clock tree clk clk power 27% Clock network C clk Power breakdown of an ASIC Chen et al . Using multi-bit flip-flop for clock power INTEGRA - ISPD'11 Clock root saving by DesignCompiler. SNUG , 2010.

  4. Multi-Bit Flip-Flops 4  A multi-bit flip-flop (MBFF)  Cluster several single-bit flip-flops (share the drive strength) g p p ( g ) Single-bit flip-flop Dual-bit flip-flop clk Master Master Slave Slave Q Q 1 D 1 latch latch Master Slave Q D latch latch high drive clk strength Master Slave Q 2 D 2 latch latch  Save flip-flop power and area Bit number 1 2 4 Normalized power per bit Normalized power per bit 1 000 0 860 0 780 1.000 0.860 0.780 Normalized area per bit 1.000 0.960 0.713 INTEGRA - ISPD'11

  5. Clock Power Saving using MBFFs (1/2) 5  Reduce switching capacitance charged/discharged by clock Switching capacitance Switching capacitance Clock power saving Clock power saving Other benefits Other benefits Clock sinks Small FF capacitance: Small area: (Flip-flops) Share C into FF clock pins Share the inverter chain Clock network Small wire/buf capacitance: p Regular topology and g p gy #leaf   depth  #buffer  (wires, clock buffers) easy skew control Clock root Clock root 8 C 8 C FF 3 C 3 C FF Pokala et al. Physical synthesis for INTEGRA - ISPD'11 performance optimization. ASIC , 1992.

  6. Clock Power Saving using MBFFs (2/2) 6  Clock power reduction can be significant  FF clock pins, clock buffers/inverters, wires in clock network p , ,  Wire power overhead on data pins is small  Wirelength on data pins << total wirelength D Q D Q D Q D Q D Q Comb ckt Comb ckt … … clk clk clk clk clk Clock network clock tree power 27% INTEGRA - ISPD'11 Clock root

  7. Prior Works on MBFF Clustering 7  Logic synthesis Logic synthesis w/  [Chen et al. , SNUG-10] [ , ] MBFF clustering MBFF clustering  Early physical synthesis Placement  [Hou et al. , ISQED-09] Timing analysis Timing analysis  Post-placement: timing and routing  Post-placement: timing and routing  [Yan and Chen, ICGCS-10] Post-placement MBFF clustering  Minimum clique paritioning  Greedy clustering G d l t i Legalization  Contiguous and infinite MBFF library Clock tree synthesis  [Chang et al ., ICCAD-10]  Window-based clustering Routing  Maximum independent set  Discrete and finite MBFF library y INTEGRA - ISPD'11

  8. INTEGRA 8  Since post-placement MBFF clustering is NP-hard, our goal is to solve it effectively and efficiently instead of optimally.  Do not enumerate all possible combinations (maximal cliques)  Do not relate to the number of layout grids/bins  Do not manipulate on a general graph  Do not manipulate on a general graph  Features:  Efficient representation: a pair of linear size sequences  Efficient representation: a pair of linear-size sequences  Fast operations: coordinate transformation  Few decision points: #decision points << #flip-flops  We cluster flip-flops at only decision points thus leading to an efficient clustering scheme.  Global relationships among flip-flops: cross bin boundaries INTEGRA - ISPD'11

  9. Outline 9 Introduction Introduction Problem & properties Algorithm - INTEGRA Experimental results Conclusion INTEGRA - ISPD'11

  10. The Multi-Bit Flip-Flop Clustering Problem 10  Clock power saving using multi-bit flip flops WL  Given D Q D Q  MBFF library clk clk  Nelist & Placement Power Power  Timing slack constraints (in terms of wirelength)  Timing slack constraints (in terms of wirelength) Clock network  Placement density constraint  Find  MBFF clustering to MBFF l t i t  Minimize  Clock dynamic power  Wirelength  Subject to  Timing slack constraints (in terms of wirelength) g ( g )  Placement density constraints INTEGRA - ISPD'11

  11. MBFF Library 11  MBFF library  Lexicographical order: <1,100,100>, <2,172,192>, <4,312,285> g p , , , , , , , , Normalized Normalized Bit number Power Area power per bit area per bit 1 1 100 100 100 100 1 00 1.00 1 00 1.00 2 172 192 0.86 0.96 4 312 285 0.78 0.71 INTEGRA - ISPD'11

  12. Placement 12  Chip area = W c H c bins = WH grids  Flip-flops should be placed on grid (left-bottom corner) p p p g ( )  Placement density constraint for bin b :  A fb ≤ T b (W b H b A g − A pb ) − A cb  A : FF area  A fb : FF area  A cb : Combinational logic area  A pb : macro area W c W b  A g : grid area A id Bin  T b : target density H b Grid Grid H c point Macro Grid A px W = W c W b c b A py H = H c H b A pb = A px A py INTEGRA - ISPD'11

  13. Timing Slack and Feasible Region 13 Input slack Feasible region  Slack  wirelength F r ( i ) Slope = +1 p Slope = -1 p S fo ( i ) i i S ( i ) S fi ( i ) S ( i ) S fi ( i ) Fanout gate Fanin gate Fanin gate D Q Multiple-fanout: Comb ckt Comb ckt multiple fanout diamonds clk INTEGRA - ISPD'11

  14. Coordinate Transformation (1/3) F ( i ) F r ( i ) 14  It’s hard to F r ( i ) determine if a grid f o ( i ) S fo ( i ) fo ( ) point is located inside or outside S fi ( i ) the feasible region f i ( i ) f ( i ) y Fanout gate x I x' ( i ) Fanin gate y' = e y' ( i )  Rotate 45  clockwise; we ; have rectangles I y' ( i ) instead  Easy checking! y g y' = s y' ( i ) y' x' = s x' ( i ) x' = e x' ( i ) x' INTEGRA - ISPD'11

  15. Coordinate Transformation (2/3) 15  Coordinate transformation is done by integer operations x' = y + x x = ( x' - y' )/2 1 1 S Scaling factor: li f t y' = y - x y = ( x' + y' )/2 1 y x' 1 C =  2 C' Grid point Grid point Non-grid Non grid =( H , H ) C' = ( H+W , H-W ) C' (0, H ) C ( W , H ) C y' Bin Grid  /4 x ( W , 0) C (0, 0) C =( W , -W ) C' = (0, 0) C' ( , ) C ( , ) C INTEGRA - ISPD'11

  16. Coordinate Transformation (3/3) 16 F r ( j 2 ) ( x 0 , y 0 + S ) F r ( i ) f o ( i ) F ( j ) F r ( j ) S j = { j 1 , j 2, j 3 } ( x 0 , y 0 ) ( x 0 - S , ( x 0 + S , y 0 ) F r ( j 1 ) y 0 ) y 0 ) yF r ( j 3 ) F ( j 3 ) f i ( i ) f i ( i ) ( x 0 , y 0 - S ) y y x x x I ( i ) I x' ( i ) y' = e y' ( i ) y' = y' 0 + S S 2 S 2 S ( x' 0 , y' 0 ) I y' ( i ) y' = y' 0 - S I y' ( j ) x' = x' 0 - S y = s y' ( i ) y' = s ( i ) y' ' y' ' x' = x' 0 + S y' x' = s x' ( i ) x' = e x' ( i ) x' x' I x' ( j ) x' INTEGRA - ISPD'11

  17. Outline 17 Introduction Introduction Problem & properties Algorithm - INTEGRA Experimental results Conclusion INTEGRA - ISPD'11

  18. Overview of INTEGRA 18 Analyzes the design intent Analyzes the design intent y y g g 1. 1. Initialization Initialization Finds a decision point in X’ and Finds a decision point in X’ and 2. 2. extracts the essential flip-flops and extracts the essential flip-flops and their related flip-flops their related flip-flops Flip-flop clustering Flip flop clustering Finds the maximal clique in the Finds the maximal clique in the 3. 3. partial Y’ for each essential flip-flop partial Y’ for each essential flip-flop Flip-flop placement Clusters each essential flip-flop Clusters each essential flip-flop p p p p 4. 4. Places the clustered flip-flop at a Places the clustered flip-flop at a 5. 5. legal location with routing cost and legal location with routing cost and Any more density consideration density consideration y y FFs? Y Y Repeats steps 2–5 until all flip- Repeats steps 2–5 until all flip- 6. 6. N flops are investigated flops are investigated Done INTEGRA - ISPD'11

  19. Example (1/5) 19 Initial Transformed 10 FF0 FF0 FF7 FF7 9 9 FF1 FF1 FF6 FF6 8 FF5 FF5 7 FF4 FF4 6 5 FF3 FF3 4 3 2 2 FF2 FF2 1 y y' 0 1 2 3 4 5 6 7 8 9 10 x x x' x' INTEGRA - ISPD'11

  20. Example (2/5) - Representation p 0] 0] [7,10 [0,10 y' ' [5,9] ] ] [1,2] ] [0,5] [2,7] ] [7,8] ] [4,9] ] 20 10 10  Two interval graphs FF0 FF0 FF7 FF7 8 9 9 FF1 FF1 FF6 FF6 8 FF5 FF5 FF5 FF5 3 4 5 6 7 7 FF4 FF4 6 5 FF3 FF3 4 0 1 2 3 3 3 2 FF2 FF2 1 y' 0 1 2 3 4 5 6 7 8 9 10 0 2 3 4 5 6 7 1 x x' 0 1 2 3 4 5 6 7 8 9 10 x' 0 0 [0,4] [0,4] 1 [1,3] [0,7] 2 3 [1,9] [4 6] [4,6] 4 [0,9] 5 6 [8,10] INTEGRA - ISPD'11 7 [2,8]

Recommend


More recommend