jihye kwon matthew m ziegler luca p carloni
play

Jihye Kwon *, Matthew M. Ziegler , Luca P. Carloni* *Department of - PowerPoint PPT Presentation

2019 Design Automation Conference Jihye Kwon *, Matthew M. Ziegler , Luca P. Carloni* *Department of Computer Science, Columbia University, New York, NY, USA IBM T. J. Watson Research Center, Yorktown Heights, NY, USA Diverse


  1. 2019 Design Automation Conference Jihye Kwon *, Matthew M. Ziegler † , Luca P. Carloni* *Department of Computer Science, Columbia University, New York, NY, USA † IBM T. J. Watson Research Center, Yorktown Heights, NY, USA

  2. • Diverse application areas ✓ Movies, music, SNS posts, online shopping items, personalized tips • Two main paradigms ✓ Content filtering ✓ Collaborative filtering User profile / preferences ? Item content / ? ? information ? ? 2

  3. • VLSI design with CAD tools for Logic Synthesis and Physical Design (LSPD) ✓ Hierarchy of a high-performance processor: Chip → Processor core → Unit → Macro (10,000 – 100,000+ logic gates) Macro LSPD parameter specification configuration (scenario) LSPD Phase-1 LSPD Phase-2 Logic Physical Clock tree Post-placement Post-route Routing synthesis placement synthesis optimization optimization Estimated Quality-of-Result (QoR) Layout for Design-Space Exploration e.g., timing, power, routability fabrication 3

  4. Iterative LSPD Parameter Tuning Runs Hyper-parameters Data filter & Macro data normalization (RTL, constraints, LSPD parameter Offline Learning linked libraries) configuration Proposed Recommender Parallel LSPD Flow Runs System QoR Model QoR statistics Cost QoR Cost Analysis function Cost function Online Feedback Macro Recommendation ቊ Legacy (in Archive) ⋯ Across multiple: New (not observed) ✓ Macros per chip ✓ Tapeouts per chip Recommended LSPD Results ✓ Chips per technology Scenarios Archive 4 ✓ Technology nodes

  5. • The Archive contains sparse records of ✓ Macro : RTL description, timing and physical constraints, linked libraries (Input: Macro, Scenario ; Output: QoR ) ✓ Scenario : configuration of binary meta- Input Output (normalized QoR) parameters for tuning LSPD flows Macro Scenario Slack 1 Slack 2 Slack 3 Power Congestion 1000 ⋯ 0 0.42 0.56 0.34 0.88 0.76 ✓ QoR : normalized QoR scores for each of 0110 ⋯ 0 0.89 0.87 0.68 0.75 0.60 the 𝑒 metrics (e.g., 5) for each macro 𝑛 1 1010 ⋯ 1 0.92 0.84 0.56 0.65 0.54 0101 ⋯ 1 0.27 0.30 0.40 0.45 0.63 • Goal: to build a QoR prediction model 𝐺 ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ 1000 ⋯ 0 0.34 0.22 0.50 0.56 0.83 𝐺 𝑁𝑏𝑑𝑠𝑝, 𝑇𝑑𝑓𝑜𝑏𝑠𝑗𝑝 = (𝑅𝑝𝑆 1 , ⋯ , 𝑅𝑝𝑆 𝑒 ) 𝑛 2 1011 ⋯ 0 0.51 0.63 0.74 0.66 0.77 ↑ ↑ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ NOT easily available or quantifiable ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ → A collaborative filtering approach LSPD Results Archive 5

  6. • Goal: to build a QoR prediction model 𝐺 ✓ A collaborative filtering approach E.g., Matrix factorization for a movie recommender system 1 −1 0.7 0.2 0.9 0.4 ≈ × −0.2 0.8 0.3 0.8 0.1 0.6 −1 1 (User, movie) scores User matrix Movie matrix , = −1, 1 × 0.2, 0.8 = 0.6 → ቊ , = −1, 1 × 0.9, 0.1 = −0.8 6

  7. • Goal: to build a QoR prediction model 𝐺 ✓ (Macro, Scenario) scores ( 150,000 out of 2 250 ) Scenario Scenario (1,000) Macro Macro ⋯ 𝑅𝑝𝑆 1 𝑅𝑝𝑆 𝑒 ( 300,000 observations) Extremely sparse for highly tuned scenarios ✓ (Macro, Parameter, QoR metric) scores 𝑈 Defined by CP tensor decomposition ( 250) LSPD Parameter (Number of latent features: 50 ) QoR ≈ (1,000) Macro metric 𝑈 Parameter matrix matrix 𝑵 ( 250 × 50 ) ( 5 × 50 ) Latent scores Macro matrix ( 1000 × 50 ) 7

  8. • Goal: to build a QoR prediction model 𝐺 ✓ Macro matrix 𝑵 , Parameter matrix 𝑸 , QoR matrix 𝑹 → Latent tensor 𝑈 CP tensor decomposition LSPD Parameter 𝒃 𝒄 𝒅 Project and normalize QoR = Macro (divide by no. of parameters) 𝒋 𝑈 metric 𝑵 Parameter matrix Latent Output matrix layer layer Macro matrix ✓ A single-layer perceptron network 𝑯 for QoR prediction (regression) 𝐺 Macro 𝒏 𝒋 , Scenario 𝒒 𝒃 ⋅ 𝒒 𝒄 ⋅ 𝒒 𝒅 ; 𝑈 = 𝑯(𝑼 𝒋𝒃: , 𝑼 𝒋𝒄: , 𝑼 𝒋𝒅: ) ✓ Learn ( 𝑵, 𝑸, 𝑹, 𝑯 ) by a stochastic gradient descent (SGD) method ⋮ ➢ Objective: to minimize the prediction error (RMSE) 8

  9. + Designer’s Recommended Online QoR Model Recommendation Scenarios parameter settings 𝐺 = ( 𝑵, 𝑸, 𝑹, 𝑯 ) QoR cost function Macro • New Macros (e.g., weights) ✓ Sample LSPD results for a new macro • Legacy Macros ✓ Train 𝐺 ∗ = ( 𝒏 ∗ , 𝑸, 𝑹, 𝑯 ) ✓ Target macro 𝑛 𝑗 in the archive = ✓ Use 𝐺 = ( 𝑵[𝒋], 𝑸, 𝑹, 𝑯 ) 𝒏 ∗ for making an inference in minutes 𝑯 with 𝑸, 𝑹, 𝑯 fixed (to learn 𝒏 ∗ ) instead of applying an LSPD flow → Use 𝐺 ∗ = ( 𝒏 ∗ , 𝑸, 𝑹, 𝑯 ) for inference taking hours 9

  10. LSPD Results Archive ✓ 1,000 macros in 14 nm chip designs and tapeouts ✓ 250 binary meta-parameters ✓ 300,000 LSPD flow results ✓ 150,000 distinct scenarios ✓ 80% train set, 20% validation set 10

  11. Macro Logic Runtime Logic function name gates (hours) Recommended + Designer’s Floating-point FP 75 K 8.0 (Ours) pipeline Execution control Designer’s Setting ECDT 45 K 6.2 & data transfer IDEC Instruction decode 210 K 21.6 Instruction ISC 77 K 13.1 sequencing control L2 cache control LSC 195 K 12.3 Default Flow & FSM 5 Macros from Industrial 14nm Processors 11

  12. Macro Logic Runtime Logic function name gates (hours) Floating-point FP 75 K 8.0 pipeline Iterative Tuning Execution control ECDT 45 K 6.2 Recommended & data transfer (Ours) IDEC Instruction decode 210 K 21.6 Instruction 50 Sample ISC 77 K 13.1 sequencing control Parameters L2 cache control LSC 195 K 12.3 & FSM Default Flow 5 Macros from Industrial 14nm Processors 12

  13. • Collaborative recommendation for VLSI design • Data from LSPD flow runs of industrial high-performance processors • Reduced computational (LSPD) cost for design-space exploration • Many unique and unobserved scenarios recommended • The model learned for 14nm designs used for a 7nm design in progress 13

Recommend


More recommend