FF-Bond: Multi-bit Flip-flop Bonding at Placement C HANG -C HENG T SAI Y IYU S HI G UOJIE L UO I RIS H UI -R U J IANG IRIS Lab NCTU – MST – PKU ISPD-13
Outline 2 Introduction Introduction Preliminaries Preliminaries Problem formulation Problem formulation Algorithm - FF-Bond Algorithm - FF-Bond Experimental results Experimental results Conclusion Conclusion
Multi-Bit Flip-Flops (MBFFs) 3 Clock power is critical for modern IC designs 𝐸𝑧𝑜𝑏𝑛𝑗𝑑 𝑞𝑝𝑥𝑓𝑠 ∝ 𝐷𝑊 2 𝑔 MBFFs present a smaller load on the clock network Replace FFs with MBFFs Effectively reduces both clock network power and MBFF power Prefer large FFs (high bit number), avoid orphans (single-bit FF) Avoid impacting timing critical paths Master Master Slave Slave D 1 D 1 Q 1 Q 1 Bit Normalized Normalized latch latch latch latch number power per bit area per bit 1 1.00 1.00 clk clk 2 0.86 0.96 4 0.78 0.71 Master Slave D 2 Q 2 Power efficient latch latch
Prior Work 4 Relocating flip-flops benefits clock network synthesis [Cheon+, DAC05], [Papa+, Micro11], [Lee/Markov, TCAD12] Replacing flip-flops with MBFFs saves clock power [Yan/Chen, ICGCS10], [Chang+, ICCAD10], [Wang+, ISPD11] [Jiang+, ISPD11], [Liu+, DATE12] Focus on post-placement MBFF clustering Pre-placement Lack physical information MBFF bonding at-placement Post-placement Cells are immovable Limited clustering flexibility and quality
One Possible Solution … 5 Directly integrate placement & post-placement MBFF clustering Netlist Timing-driven placement MBFF clustering End The movement of flip-flops Is constrained by the placement at the current iteration May oscillate among iterations
Ionic Bonding and Flip-flop Bonding 6 Goal: Guide flip-flops towards merging friendly locations e - + - Ionic Na F Na F bonding Na + F NaF Flip-flop Flip-flop bonding Example: MBFF library: 1-bit, 2-bit, 4-bit Mergeable flip-flop sets http://en.wikipedia.org/wiki/Ionic_bond
Post-Placement vs. At-Placement - s38417 7 Post-placement clustering At-placement bonding # MBFFs 4-/2-/1-bit # MBFFs 4-/2-/1-bit 35/252/237 159/105/35 MBFF SBFF
Outline 8 Introduction Introduction Preliminaries Preliminaries Problem formulation Problem formulation Algorithm - FF-Bond Algorithm - FF-Bond Experimental results Experimental results Conclusion Conclusion
Post-Placement MBFF Clustering 9 Given A placed design MBFF library Timing slacks of flip-flops SBFF Replace FFs with MBFFs MBFF Minimize flip-flop power Satisfy timing constraints MBFF Clustering
Intersection Graph 10 Define the feasible region of a flip-flop according to its slack Model the overlap of feasible regions by an intersection graph A proper-sized clique corresponds to an MBFF Feasible region 1 6 f o ( i ) 3 5 i 8 4 f i ( i ) 2 7 y Fanout slack Intersection graph Fanin slack x
INTEGRA (INTErval GRAph) 11 Perform coordinate transformation Sort starting (s) and ending (e) points of projection in the x’ and y’ axes y x FF1 FF2 FF3 FF4 FF5 FF6 FF7 FF8 TYPE s s s s e e s s s e s e e e e e FF# 1 2 3 4 1 2 5 6 7 3 8 4 5 7 6 8 Jiang et al. “INTEGRA: Fast multibit flip-flop clustering for clock power saving ,” TCAD12, ISPD11.
INTEGRA 12 e 1 Find decision points FF1 FF6 e 3 ‘se’ in x’ axis e 4 s 3 FF3 Retrieve maximal cliques at FF5 e 2 FF4 decision points s 4 y’ s 1 FF8 Check x’ and y’ axes s 2 T x’ {1, 2, 4} or {1, 3, 4} F Y F P Form MBFFs of proper sizes # FF2 E FF7 e.g., {1, 2} FF1 FF2 FF3 FF4 FF5 FF6 FF7 FF8 TYPE s s s s e e s s s e s e e e e e FF# 1 2 3 4 1 2 5 6 7 3 8 4 5 7 6 8 Decision points
Example: INTEGRA 13 Example: MBFF library: 1-bit, 2-bit, 4-bit FF1 FF6 1 1 1 1 6 6 6 6 FF3 3 3 3 3 FF5 FF4 FF8 5 5 5 5 8 8 8 8 4 4 4 4 FF2 y’ 2 2 2 2 FF7 7 7 7 7 x’ Guide FFs towards 2 dual-bit flip-flops 2 four-bit flip-flops merging friendly locations 1 four-bit flip-flop
Outline 14 Introduction Introduction Preliminaries Preliminaries Problem formulation Problem formulation Algorithm - FF-Bond Algorithm - FF-Bond Experimental results Experimental results Conclusion Conclusion
The MBFF Bonding at Placement Problem 15 Given Gate-level netlist MBFF library Timing constraints Find a placement and replace FFs with MBFFs Minimize flip-flop power Satisfy timing constraints
Outline 16 Introduction Introduction Preliminaries Preliminaries Problem formulation Problem formulation Algorithm - FF-Bond Algorithm - FF-Bond Experimental results Experimental results Conclusion Conclusion
The Overview of FF-Bond 17 Guide flip-flops towards merging friendly locations at the global placement stage without sacrificing timing Netlist FF-Bond Global placement Objective function construction Signoff timer FF-Bond with timing-driven net weighting Gradient-based optimization Legalization solver Detailed placement Evenly distributed? Sparse enough? < d 1 < d 2 N N Y Y Clock tree synthesis Flip-flop bonding Routing MBFF clustering Pseudo-net generation End = 𝑝𝑤𝑓𝑠𝑚𝑏𝑞𝑞𝑓𝑒_𝑏𝑠𝑓𝑏 : overlap index 𝑢𝑝𝑢𝑏𝑚_𝑑𝑓𝑚𝑚_𝑏𝑠𝑓𝑏
Example: s38584 18 Netlist FF-Bond Global placement Objective function construction Signoff timer with timing-driven net weighting Gradient-based optimization solver Evenly distributed? Sparse enough? < d 1 < d 2 N N Y Y Flip-flop bonding : overlap index MBFF clustering Pseudo-net generation = 𝑝𝑤𝑓𝑠𝑚𝑏𝑞𝑞𝑓𝑒_𝑏𝑠𝑓𝑏 𝑢𝑝𝑢𝑏𝑚_𝑑𝑓𝑚𝑚_𝑏𝑠𝑓𝑏
Example: s38584 19 Spread cells until sparse enough Netlist FF-Bond Global placement Objective function construction Signoff timer with timing-driven net weighting Gradient-based optimization solver Evenly distributed? Sparse enough? < d 1 < d 2 N N Y Y Flip-flop bonding : overlap index MBFF clustering Pseudo-net generation = 𝑝𝑤𝑓𝑠𝑚𝑏𝑞𝑞𝑓𝑒_𝑏𝑠𝑓𝑏 𝑢𝑝𝑢𝑏𝑚_𝑑𝑓𝑚𝑚_𝑏𝑠𝑓𝑏
Example: s38584 20 Apply flip-flop bonding Netlist FF-Bond Global placement Objective function construction Signoff timer with timing-driven net weighting Gradient-based optimization solver Evenly distributed? Sparse enough? < d 1 < d 2 N N Y Y Flip-flop bonding : overlap index MBFF clustering Pseudo-net generation = 𝑝𝑤𝑓𝑠𝑚𝑏𝑞𝑞𝑓𝑒_𝑏𝑠𝑓𝑏 𝑢𝑝𝑢𝑏𝑚_𝑑𝑓𝑚𝑚_𝑏𝑠𝑓𝑏
Timing-Driven Placement 21 Pure wirelength-driven placement + slack-based net-weighting Pure wirelength-driven analytical placement: mPL 𝐧𝐣𝐨 𝑋 𝑦, 𝑧 = 𝑓∈𝐹 𝑤 𝑗 ,𝑤 𝑘 ∈𝑓,𝑗<𝑘 𝑦 𝑗 − 𝑦 𝑘 + max 𝑤 𝑗 ,𝑤 𝑘 ∈𝑓,𝑗<𝑘 𝑧 𝑗 − 𝑧 𝑘 max 𝐭. 𝐮. 𝐸 𝑗𝑘 = 𝐿 , 1 ≤ 𝑗 ≤ 𝑛, 1 ≤ 𝑘 ≤ 𝑜 Smooth the objective function and the constraints Log-sum-exp approximation exp 𝑦 𝑙 exp −𝑦 𝑙 exp 𝑧 𝑙 exp −𝑧 𝑙 𝑋 𝑦, 𝑧 = 𝜃 log + log + log + log 𝜃 𝜃 𝜃 𝜃 𝑓∈𝐹 𝑤 𝑙 ∈𝑓 𝑤 𝑙 ∈𝑓 𝑤 𝑙 ∈𝑓 𝑤 𝑙 ∈𝑓 Inverse Laplace transformation 𝐧𝐣𝐨 𝑋 𝑦, 𝑧 𝜔 𝑗𝑘 = 𝐭. 𝐮. 𝐿, 1 ≤ 𝑗 ≤ 𝑛, 1 ≤ 𝑘 ≤ 𝑜 Slack-based net weighting α 𝑡𝑚𝑏𝑑𝑙 𝑜𝑓𝑢 𝑥𝑓𝑗ℎ𝑢 = 1 − , α > 1 𝑈 𝑑𝑚𝑙 slack = 0 for the 1 st iteration (pure wirelength-driven placement) T. Chen et al . “Multilevel generalized force - directed method for circuit placement,” ISPD05.
Flip-Flop Bonding 22 Bond flip-flops into perfect-sized cliques Perfect: most power efficient Bit Normalized Normalized Example: number power per bit area per bit 1 1.00 1.00 2 0.86 0.96 4 0.78 0.71 Power efficient Oversized Perfect Undersized
Flip-Flop Bonding 23 Bond flip-flops into perfect-sized cliques Priority of processing maximal cliques: Perfect > undersize > oversize Perfect size: preserved Undersize/oversize: try to form a target-sized clique by selecting the nearest flip-flops in a specified search region The target size: the flip-flop configuration that is larger than, nearest to, and more power efficient than the investigated clique size Adjacency inside the search region 𝑦 c − 𝑦 𝑗 + 𝑧 𝑑 − 𝑧 𝑗 − 𝜁 𝑗 × 𝑡 𝑔𝑗 𝑗 + 𝑡 𝑔𝑝 𝑗 Physical & timing:
Example: Flip-Flop Bonding (1/3) 24 Extract maximal cliques Flip-flop Bit Normalized Normalized number power per bit area per bit 1 1.00 1.00 2 0.86 0.96 4 0.78 0.71 Power efficient
Example: Flip-Flop Bonding (2/3) 25 Bonding strategy Choose an undersized clique with priority 3>2>1 Select nearest flip-flops to form target-sized cliques 3 → 4 2 → 4 1 → 2 Choose an oversized clique Select nearest flip-flops to form target-sized cliques Even → 4X Odd → even
Recommend
More recommend