Design Optimization by Fine-grained Interleaving of Local Netlist Transformations in Lagrangian Relaxation Apostolos Stefanidis, Dimitrios Mangiras, Giorgos Dimitrakopoulos Democritus University of Thrace, Greece Chrystostomos Nicopoulos David Chinnery University of Cyprus, Cyprus Mentor, a Siemens Business, USA June 30, 2020
Ο utline • Design optimization • Timing-Power optimization using Langrangian relaxation (LR) • Embedding multiple heuristics inside the same Multi Mode Multi Corner LR optimization loop • The criteria that each heuristic should satisfy to be compatible with LR-based optimization • Order of applying each heuristic • Experimental results based on benchmarks of the TAU2019 contest • Conclusions A. Stefanidis / Democritus University of Thrace, Greece 2
Design optimization • Gate-level netlist changes to optimize: • Timing – fix early/late violations • Reduce leakage/dynamic power, area, wire length … • Can be applied in any physical design step • Additional considerations (e.g. SI noise) + need for accuracy increase through the flow • Examples: Sizing Relocating Initial circuit Buffer insertion A. Stefanidis / Democritus University of Thrace, Greece 3
Design optimization using Lagrangian relaxation • Relaxes timing constraints into a simplified objective function • Lagrangian multipliers (LMs) weigh the constraints to try and ensure that they are met • Already successfully applied for • Combinational gate sizing • Clock tree sizing • Timing driven incremental placement • Our proposal : embed multiple optimization heuristics in the same Lagrangian relaxation optimization loop A. Stefanidis / Democritus University of Thrace, Greece 4
Problem formulation 𝑄(𝑑) : leakage of cell c 𝑀 − 𝐹 min 𝑄 𝑑 + 𝐵 𝑑 − 𝑡𝑚𝑙 𝑘 𝑡𝑚𝑙 𝑘 A (𝑑) : area of cell c 𝑑∈cells 𝑘∈POs 𝑘∈POs 𝑀 : late timing information 𝑀 ≤ 0 and 𝑡𝑚𝑙 𝑘 𝐹 ≤ 0, 𝐭. 𝐮. : 𝑡𝑚𝑙 𝑘 ∀𝑘 ∈ POs 𝐹 : early timing information 𝑀 ≤ 𝑠 𝑀 − 𝑏 𝑘 𝑀 and 𝑡𝑚𝑙 𝑘 𝐹 ≤ 𝑏 𝑘 𝐹 − 𝑠 𝐹 , ∀𝑘 ∈ POs 𝑡𝑚𝑙 𝑘 𝑡𝑚𝑙 𝑘 : negative slack of pin 𝑘 𝑘 𝑘 𝑀 and 𝑏 𝑘 𝐹 ≤ 𝑏 𝑗 𝐹 + 𝑒 𝑗→𝑘 𝐹 , ∀𝑗 → 𝑘 ∈ arcs 𝑠 𝑘 : required time of pin 𝑘 𝑀 +𝑒 𝑗→𝑘 𝑀 𝑏 𝑗 ≤ 𝑏 𝑘 𝑏 𝑘 : arrival time of pin 𝑘 𝑒 𝑗→𝑘 : delay of timing arc 𝑗 → 𝑘 • Target: minimize sum of leakage arcs : timing arcs of the design cells : cells of the design power, area, and total negative slack POs : Primary outputs or timing (TNS) endpoints of the design A. Stefanidis / Democritus University of Thrace, Greece 5
Lagrangian relaxation formulation (1) 𝑀 − 𝐹 min 𝑄 𝑑 + 𝐵 𝑑 − 𝑡𝑚𝑙 𝑘 𝑡𝑚𝑙 𝑘 𝑑∈cells 𝑘∈POs 𝑘∈POs 𝑀 ≤ 0 and 𝑡𝑚𝑙 𝑘 𝐹 ≤ 0, 𝐭. 𝐮. : 𝑡𝑚𝑙 𝑘 ∀𝑘 ∈ POs 𝑀 , 𝜇 𝑘1 𝑀 : Late LMs for slack 𝑀 ≤ 𝑠 𝑀 − 𝑏 𝑘 𝑀 and 𝑡𝑚𝑙 𝑘 𝐹 ≤ 𝑏 𝑘 𝐹 − 𝑠 𝜇 𝑘0 𝐹 , ∀𝑘 ∈ POs 𝑡𝑚𝑙 𝑘 𝑘 𝑘 constraints on endpoints 𝑀 and 𝑏 𝑘 𝐹 ≤ 𝑏 𝑗 𝐹 + 𝑒 𝑗→𝑘 𝐹 , ∀𝑗 → 𝑘 ∈ arcs 𝑀 +𝑒 𝑗→𝑘 𝑀 𝑏 𝑗 ≤ 𝑏 𝑘 𝑀 𝜇 𝑗→𝑘 : Late LMs for early delay constraints on arcs Langrangian relaxation 𝑀 − 𝐹 + 𝐹 , 𝜇 𝑘1 𝐹 : Early LMs for slack min 𝑄 𝑑 + 𝐵 𝑑 − 𝑡𝑚𝑙 𝑘 𝑡𝑚𝑙 𝑘 𝜇 𝑘0 constraints on endpoints 𝑑∈cells 𝑘∈POs 𝑘∈POs 𝐹 𝑀 𝑡𝑚𝑙 𝑘 𝑀 + 𝜇 𝑘0 𝐹 𝑡𝑚𝑙 𝑘 𝐹 + 𝜇 𝑗→𝑘 : Early LMs for early 𝜇 𝑘0 delay constraints on arcs 𝑘∈POs 𝑀 (𝑡𝑚𝑙 𝑘 𝑀 − 𝑠 𝑀 + 𝑏 𝑘 𝐹 (𝑡𝑚𝑙 𝑘 𝐹 − 𝑏 𝑘 𝐹 + 𝑠 𝑀 ) + 𝜇 𝑘1 𝐹 ) + 𝜇 𝑘1 𝑘 𝑘 𝑘∈POs 𝑀 + 𝑒 𝑗→𝑘 𝐹 − 𝑏 𝑗 𝐹 − 𝑒 𝑗→𝑘 𝑀 𝑀 𝑀 ) + 𝜇 𝑗→𝑘 𝐹 𝐹 𝜇 𝑗→𝑘 (𝑏 𝑗 − 𝑏 𝑘 (𝑏 𝑘 ) 𝑗→𝑘∈arcs A. Stefanidis / Democritus University of Thrace, Greece 6
Lagrangian relaxation formulation (2) • 𝜇 values represent the criticality of each constraint • Karush-Kuhn-Tucker (KKT) optimality conditions 𝑀 𝑀 𝐹 𝐹 σ ∀𝑗∈𝑗𝑜𝑘 𝜇 𝑗→𝑘 = σ ∀𝑙∈out𝑘 𝜇 𝑘→𝑙 , σ ∀𝑗∈𝑗𝑜𝑘 𝜇 𝑗→𝑘 = σ ∀𝑙∈𝑝𝑣𝑢𝑘 𝜇 𝑘→𝑙 • By applying the KKT conditions and simplifying: 𝑀 𝑀 𝐹 𝐹 min 𝑄 𝑑 + 𝐵 𝑑 + 𝜇 𝑗→𝑘 𝑒 𝑗→𝑘 − 𝜇 𝑗→𝑘 𝑒 𝑗→𝑘 𝑑∈cells 𝑗→𝑘∈arcs 𝑀 = 𝜇 3→4 𝑀 𝑀 𝐹 = 𝜇 3→4 𝜇 40 + 𝜇 2→4 𝐹 𝐹 𝜇 40 + 𝜇 2→4 𝑀 𝑀 𝜇 1→3 = 𝜇 3→4 𝐹 𝐹 𝜇 1→3 = 𝜇 3→4 A. Stefanidis / Democritus University of Thrace, Greece 7
Lagrangian multiplier updates 𝑀 𝑀 𝐹 𝐹 • Timing arc 𝑗 → 𝑘 affects cost function by: 𝜇 𝑗→𝑘 𝑒 𝑗→𝑘 − 𝜇 𝑗→𝑘 𝑒 𝑗→𝑘 • High late LM ⇒ delay should decrease ⇒ late critical arc • High early LM ⇒ delay should increase ⇒ early critical arc • Update method: 𝑀 +𝑒 𝑗→𝑘 𝑀 𝑀 (𝑏 𝑗 ) 𝑀 𝑏 𝑘 𝑀 = 𝜇 𝑘0 𝑀 𝑀 • 𝜇 𝑗→𝑘 = 𝜇 𝑗→𝑘 𝜇 𝑘0 , 𝑀 , 𝑀 𝑏 𝑘 𝑠 𝑘 𝐹 𝐹 𝑏 𝑘 𝐹 𝑠 𝑘 𝐹 = 𝜇 𝑘0 𝐹 𝐹 • 𝜇 𝑗→𝑘 = 𝜇 𝑗→𝑘 ) , 𝜇 𝑘0 𝐹 , 𝐹 +𝑒 𝑗→𝑘 𝐹 (𝑏 𝑗 𝑏 𝑘 • LM values are propagated backwards proportionally to respect KKT optimality conditions A. Stefanidis / Democritus University of Thrace, Greece 8
Local LR cost • Recalculate only timing information around the cell’s local arcs • Calculating the cost function for every timing arc of the design is avoided to save runtime 𝑀 𝑀 𝐹 𝐹 • 𝑀𝐷 𝑤 = 𝑄 𝑤 + 𝐵 𝑤 + σ 𝑗→𝑘∈𝑚𝑝𝑑𝑏𝑚_𝑏𝑠𝑑𝑡 𝜇 𝑗→𝑘 𝑒 𝑗→𝑘 − 𝜇 𝑗→𝑘 𝑒 𝑗→𝑘 A. Stefanidis / Democritus University of Thrace, Greece 9
How LR optimization loop works: Gate sizing example • Make decisions based on local information ⇒ timing is updated only on local arcs • Have discrete choices ⇒ different size / Vt options • Evaluate each choice using LM values ⇒ pick the choice with the lowest local cost Size choices A. Stefanidis / Democritus University of Thrace, Greece 10
Incorporating design transformations in the LR loop • Any transformation satisfying certain criteria can be applied inside the Lagrangian relaxation • The method has to: • Make decisions based on local information • Have discrete choices • Evaluate each choice using LM values and the same local cost function • Apply small changes each iteration ⇒ allows LR to adapt to the change A. Stefanidis / Democritus University of Thrace, Greece 11
LR design transformations in this work • In this work we apply five transformations inside the LR-based optimization loop: • Cell sizing • Pin swapping • Buffering for early violations • Buffering for late violations • Clock skew assignment • All make local decisions based on LM values A. Stefanidis / Democritus University of Thrace, Greece 12
Cell sizing • Try every option for cell to resize • Keep the option with the lowest local cost • Options that cause load/slew/slack violations are rejected • Applied on gates and flip flops that are • Early or late timing critical • Power/area critical A. Stefanidis / Democritus University of Thrace, Greece 13
Handling of early/late timing conflicts • Refers to cells with conflicting early/late timing violations • LR based resizing will try to balance the slacks based on LM values ⇒ slow convergence • Solution: only include late LMs in the local cost function of these cells • Sizing focuses on late violations • Early violations will be solved by other methods (e.g. buffering) No conflict handling Initial circuit Conflict handling by late slack focus A. Stefanidis / Democritus University of Thrace, Greece 14
Β uffer insertion for fixing late timing violations • Used for driving large net loads • Applied on the outputs of cells with high input to output capacitance ratio • Try every buffer type and keep the lowest local cost option (including adding no buffer as an option) A. Stefanidis / Democritus University of Thrace, Greece 15
Buffer insertion for fixing hold timing violations • Increase the delay on early-timing violating paths • Where to add delay • On the most critical path through all early violating endpoints • One the pin on the most critical path with the highest late-early LM difference • How much delay is added? • Add that much delay that does not degrade Late negative slack A. Stefanidis / Democritus University of Thrace, Greece 16
Pin swapping • Reconnect nets of logically equivalent pins to improve timing • For each gate that has equivalent pins: • Find the most critical input net • Try to assign it to each other equivalent pin • Keep the option with the lowest local LR cost A. Stefanidis / Democritus University of Thrace, Greece 17
Recommend
More recommend