INTEGRA: INTEGRA: Fast Multi-Bit Flip-Flop Clustering for Clock Power Saving Based on g Interval Graphs I RIS H UI -R U J IANG C HIH -L ONG C HANG Y U M ING Y ANG Y U -M ING Y ANG NCTU NCTU E VAN Y U -W EN T SAI L ANCER S HENG -F ONG C HEN L ANCER S HENG F ONG C HEN IRIS Lab Nat’l Chiao Tung Univ. / Faraday Tech Corp.
Outline 2 Introduction Introduction Introduction Introduction Problem & properties Problem & properties Algorithm - INTEGRA Algorithm - INTEGRA Experimental results Experimental results Conclusion Conclusion INTEGRA - ISPD'11
Clock Power Dominates! 3 Power has become one bottleneck for circuit implementation Clock power is the major dynamic power source p j y p The clock signal toggles in each cycle High switching activity Clock power model: dynamic power P clk = C clk V dd f clk P = C V 2 f C clk : switching capacitance charged/discharged by clock D Q D Q … Comb ckt … clock tree clk clk power 27% Clock network C clk Power breakdown of an ASIC Chen et al . Using multi-bit flip-flop for clock power INTEGRA - ISPD'11 Clock root saving by DesignCompiler. SNUG , 2010.
Multi-Bit Flip-Flops 4 A multi-bit flip-flop (MBFF) Cluster several single-bit flip-flops (share the drive strength) g p p ( g ) Single-bit flip-flop Dual-bit flip-flop clk Master Master Slave Slave Q Q 1 D 1 latch latch Master Slave Q D latch latch high drive clk strength Master Slave Q 2 D 2 latch latch Save flip-flop power and area Bit number 1 2 4 Normalized power per bit Normalized power per bit 1 000 0 860 0 780 1.000 0.860 0.780 Normalized area per bit 1.000 0.960 0.713 INTEGRA - ISPD'11
Clock Power Saving using MBFFs (1/2) 5 Reduce switching capacitance charged/discharged by clock Switching capacitance Switching capacitance Clock power saving Clock power saving Other benefits Other benefits Clock sinks Small FF capacitance: Small area: (Flip-flops) Share C into FF clock pins Share the inverter chain Clock network Small wire/buf capacitance: p Regular topology and g p gy #leaf depth #buffer (wires, clock buffers) easy skew control Clock root Clock root 8 C 8 C FF 3 C 3 C FF Pokala et al. Physical synthesis for INTEGRA - ISPD'11 performance optimization. ASIC , 1992.
Clock Power Saving using MBFFs (2/2) 6 Clock power reduction can be significant FF clock pins, clock buffers/inverters, wires in clock network p , , Wire power overhead on data pins is small Wirelength on data pins << total wirelength D Q D Q D Q D Q D Q Comb ckt Comb ckt … … clk clk clk clk clk Clock network clock tree power 27% INTEGRA - ISPD'11 Clock root
Prior Works on MBFF Clustering 7 Logic synthesis Logic synthesis w/ [Chen et al. , SNUG-10] [ , ] MBFF clustering MBFF clustering Early physical synthesis Placement [Hou et al. , ISQED-09] Timing analysis Timing analysis Post-placement: timing and routing Post-placement: timing and routing [Yan and Chen, ICGCS-10] Post-placement MBFF clustering Minimum clique paritioning Greedy clustering G d l t i Legalization Contiguous and infinite MBFF library Clock tree synthesis [Chang et al ., ICCAD-10] Window-based clustering Routing Maximum independent set Discrete and finite MBFF library y INTEGRA - ISPD'11
INTEGRA 8 Since post-placement MBFF clustering is NP-hard, our goal is to solve it effectively and efficiently instead of optimally. Do not enumerate all possible combinations (maximal cliques) Do not relate to the number of layout grids/bins Do not manipulate on a general graph Do not manipulate on a general graph Features: Efficient representation: a pair of linear size sequences Efficient representation: a pair of linear-size sequences Fast operations: coordinate transformation Few decision points: #decision points << #flip-flops We cluster flip-flops at only decision points thus leading to an efficient clustering scheme. Global relationships among flip-flops: cross bin boundaries INTEGRA - ISPD'11
Outline 9 Introduction Introduction Problem & properties Algorithm - INTEGRA Experimental results Conclusion INTEGRA - ISPD'11
The Multi-Bit Flip-Flop Clustering Problem 10 Clock power saving using multi-bit flip flops WL Given D Q D Q MBFF library clk clk Nelist & Placement Power Power Timing slack constraints (in terms of wirelength) Timing slack constraints (in terms of wirelength) Clock network Placement density constraint Find MBFF clustering to MBFF l t i t Minimize Clock dynamic power Wirelength Subject to Timing slack constraints (in terms of wirelength) g ( g ) Placement density constraints INTEGRA - ISPD'11
MBFF Library 11 MBFF library Lexicographical order: <1,100,100>, <2,172,192>, <4,312,285> g p , , , , , , , , Normalized Normalized Bit number Power Area power per bit area per bit 1 1 100 100 100 100 1 00 1.00 1 00 1.00 2 172 192 0.86 0.96 4 312 285 0.78 0.71 INTEGRA - ISPD'11
Placement 12 Chip area = W c H c bins = WH grids Flip-flops should be placed on grid (left-bottom corner) p p p g ( ) Placement density constraint for bin b : A fb ≤ T b (W b H b A g − A pb ) − A cb A : FF area A fb : FF area A cb : Combinational logic area A pb : macro area W c W b A g : grid area A id Bin T b : target density H b Grid Grid H c point Macro Grid A px W = W c W b c b A py H = H c H b A pb = A px A py INTEGRA - ISPD'11
Timing Slack and Feasible Region 13 Input slack Feasible region Slack wirelength F r ( i ) Slope = +1 p Slope = -1 p S fo ( i ) i i S ( i ) S fi ( i ) S ( i ) S fi ( i ) Fanout gate Fanin gate Fanin gate D Q Multiple-fanout: Comb ckt Comb ckt multiple fanout diamonds clk INTEGRA - ISPD'11
Coordinate Transformation (1/3) F ( i ) F r ( i ) 14 It’s hard to F r ( i ) determine if a grid f o ( i ) S fo ( i ) fo ( ) point is located inside or outside S fi ( i ) the feasible region f i ( i ) f ( i ) y Fanout gate x I x' ( i ) Fanin gate y' = e y' ( i ) Rotate 45 clockwise; we ; have rectangles I y' ( i ) instead Easy checking! y g y' = s y' ( i ) y' x' = s x' ( i ) x' = e x' ( i ) x' INTEGRA - ISPD'11
Coordinate Transformation (2/3) 15 Coordinate transformation is done by integer operations x' = y + x x = ( x' - y' )/2 1 1 S Scaling factor: li f t y' = y - x y = ( x' + y' )/2 1 y x' 1 C = 2 C' Grid point Grid point Non-grid Non grid =( H , H ) C' = ( H+W , H-W ) C' (0, H ) C ( W , H ) C y' Bin Grid /4 x ( W , 0) C (0, 0) C =( W , -W ) C' = (0, 0) C' ( , ) C ( , ) C INTEGRA - ISPD'11
Coordinate Transformation (3/3) 16 F r ( j 2 ) ( x 0 , y 0 + S ) F r ( i ) f o ( i ) F ( j ) F r ( j ) S j = { j 1 , j 2, j 3 } ( x 0 , y 0 ) ( x 0 - S , ( x 0 + S , y 0 ) F r ( j 1 ) y 0 ) y 0 ) yF r ( j 3 ) F ( j 3 ) f i ( i ) f i ( i ) ( x 0 , y 0 - S ) y y x x x I ( i ) I x' ( i ) y' = e y' ( i ) y' = y' 0 + S S 2 S 2 S ( x' 0 , y' 0 ) I y' ( i ) y' = y' 0 - S I y' ( j ) x' = x' 0 - S y = s y' ( i ) y' = s ( i ) y' ' y' ' x' = x' 0 + S y' x' = s x' ( i ) x' = e x' ( i ) x' x' I x' ( j ) x' INTEGRA - ISPD'11
Outline 17 Introduction Introduction Problem & properties Algorithm - INTEGRA Experimental results Conclusion INTEGRA - ISPD'11
Overview of INTEGRA 18 Analyzes the design intent Analyzes the design intent y y g g 1. 1. Initialization Initialization Finds a decision point in X’ and Finds a decision point in X’ and 2. 2. extracts the essential flip-flops and extracts the essential flip-flops and their related flip-flops their related flip-flops Flip-flop clustering Flip flop clustering Finds the maximal clique in the Finds the maximal clique in the 3. 3. partial Y’ for each essential flip-flop partial Y’ for each essential flip-flop Flip-flop placement Clusters each essential flip-flop Clusters each essential flip-flop p p p p 4. 4. Places the clustered flip-flop at a Places the clustered flip-flop at a 5. 5. legal location with routing cost and legal location with routing cost and Any more density consideration density consideration y y FFs? Y Y Repeats steps 2–5 until all flip- Repeats steps 2–5 until all flip- 6. 6. N flops are investigated flops are investigated Done INTEGRA - ISPD'11
Example (1/5) 19 Initial Transformed 10 FF0 FF0 FF7 FF7 9 9 FF1 FF1 FF6 FF6 8 FF5 FF5 7 FF4 FF4 6 5 FF3 FF3 4 3 2 2 FF2 FF2 1 y y' 0 1 2 3 4 5 6 7 8 9 10 x x x' x' INTEGRA - ISPD'11
Example (2/5) - Representation p 0] 0] [7,10 [0,10 y' ' [5,9] ] ] [1,2] ] [0,5] [2,7] ] [7,8] ] [4,9] ] 20 10 10 Two interval graphs FF0 FF0 FF7 FF7 8 9 9 FF1 FF1 FF6 FF6 8 FF5 FF5 FF5 FF5 3 4 5 6 7 7 FF4 FF4 6 5 FF3 FF3 4 0 1 2 3 3 3 2 FF2 FF2 1 y' 0 1 2 3 4 5 6 7 8 9 10 0 2 3 4 5 6 7 1 x x' 0 1 2 3 4 5 6 7 8 9 10 x' 0 0 [0,4] [0,4] 1 [1,3] [0,7] 2 3 [1,9] [4 6] [4,6] 4 [0,9] 5 6 [8,10] INTEGRA - ISPD'11 7 [2,8]
Recommend
More recommend