from recovering time to timing recovery some challenges
play

From Recovering Time to Timing Recovery: Some Challenges for the - PowerPoint PPT Presentation

From Recovering Time to Timing Recovery: Some Challenges for the TAU Community Andrew B. Kahng Depts. of CSE and ECE UC San Diego abk@ucsd.edu http://vlsicad.ucsd.edu/~abk TAU-2016 Keynote: In Search of Lost Time Recovering


  1. From Recovering Time to Timing Recovery: Some Challenges for the TAU Community Andrew B. Kahng Depts. of CSE and ECE UC San Diego abk@ucsd.edu http://vlsicad.ucsd.edu/~abk

  2. TAU-2016 Keynote: “In Search of Lost Time” • “Recovering Time”: machine learning, optimization, margin reduction, … 2

  3. Agenda • Motivations 3

  4. Design Crises: Cost, Expertise, Unpredictability • Design cost: not scaling • Design, process roadmaps not coupled • Figure: Andreas Olofsson, DARPA, ISPD-2018 keynote • Quality: also not scaling • Design Capability Gap • Available density: 2x/node • Realizable density: 1.6x/node • Figure: UCSD / 2013 ITRS 4

  5. Design is Too Difficult ! • Tools and flows have steadily increased in complexity • Modern P&R tool: 10000+ commands/options • Hard to design with latest tools in latest technologies • Even harder to predict quality, schedule • Expert users required • Increased cost and risk not good for industry ! • Still have “CAD” mindset more than “DA” mindset • Again: assumes expert users How do we escape this “local minimum” ? 5

  6. IDEA: No-Humans, 24-Hours A. Olofsson, DARPA ISPD-2018 keynote • Part of DARPA Electronics Resurgence Initiative • Traditional focus: ultimate quality • New focus = ultimate ease of use • No humans, 24-hour TAT = “equivalent scaling” • Overarching goal: designer access to silicon 6

  7. DARPA IDEA and POSH Programs, 2018-2022 https://vlsicad.ucsd.edu/NEWS18/dac_v5_DISTAR.pdf 7

  8. theopenroadproject.org 8

  9. OpenROAD: A New Design Paradigm 24 hours, no humans – no PPA loss Machine Learning Restricted layout of tools, flows optimization partitioning Extreme Parallel Design Complexity Mindsets  Quality • Achieve predictability from the user’s POV  Schedule • Use cloud/parallel to recover solution quality • Focus on reducing time and effort = schedule, cost  Cost Machine Learning is CENTRAL to this 9

  10. The OpenROAD Project • Initial target: digital IC flow “RTL to GDS” • Open source • No-human-in-loop • Limited “knobs”, restricted field of use • Must replace intelligent humans (partition, floorplan, …) 10

  11. Agenda • Motivations • OpenROAD + Initial Target 11

  12. Initial Target: RTL-to-GDS Layout Generation Verilog + .lib, .sdc, .lef • Inputs: .v, .sdc, .lib, .lef Logic Synthesis • .def, .spef in point tools • config files required • pre-characterizations required Floorplan/PDN • Outputs: post-route .def, timing/power estimates Placement • V1.0 release: June 2020 Clock Tree Synthesis Global and Detailed Routing Layout Finishing GDSII 12

  13. Placement https://github.com/abk-openroad/RePlAce • RePlAce features • Timing-driven (OpenSTA timer integrated) • Mixed-size (macros + cells) .def from FP/PDN • Electrostatics analogy in analytic placement (+ .v, .sdc, .lef, .lib) • RePlAce used in: • Physical synthesis Placement • Floorplanning • Clock tree synthesis • Traditional standard-cell placement Placed .def • BSD-3 License 13

  14. RePlAce: Routability-Driven Placement • Global routing during routability-driven global placement Routability-driven loop 14

  15. Static Timing Analysis https://github.com/abk-openroad/OpenSTA • OpenSTA : open-sourced static timing analysis tool • Developer: James Cherry (Parallax Software) • Tested with ASAP7, GF14, TSMC16, ST28, etc. • GPLv3 license 15

  16. Slack, WNS, TNS 28nm aes_cipher_top (28nm, 12T, clkp=1000ps) Reg-to-Out/ Reg-to-Reg In-to-Reg aes_cipher_top WNS (ps) TNS (ps) #viol. Signoff STA -61 -289 7 OpenSTA (arnoldi) -57 -314 9 16

  17. Slack, WNS, TNS 16nm Coyote (16nm, 9T, clkp=2000ps) Signoff STA OpenSTA WNS (ns) -0.660 -0.603 TNS (ns) -1758.004 -1219.239 #viol. 8096 6926 17

  18. Challenges for the TAU Community • #1. Help improve open-source STA engine • In particular: OpenSTA • Delay calculation, SI analysis, advanced timing models, MCMM, … • Priorities = ? • Will revisit: Signoff STA OpenSTA WNS (ns) -0.660 -0.603 TNS (ns) -1758.004 -1219.239 #viol. 8096 6926 18

  19. The OpenROAD Project • Initial target: digital IC flow “RTL to GDS” • Open source • No-human-in-loop • Limited “knobs”, restricted field of use • Must replace intelligent humans (partition, floorplan, …) 19

  20. Agenda • Motivations • OpenROAD + Initial Target • Machine Learning 20

  21. ML in IC Design: Not Like Chess or Cat Pics • Getting to self-driving IC design: not so obvious • Do recent ML successes transfer well? • 3-week SP&R&Opt run is NOT like playing chess! • Design lives in a {servers, licenses, schedule} box • Distributions of outcomes matter cloud, parallel • A “stack of models” is mandatory: Predictions of downstream outcomes are also optimization objectives • Still uncharted road to self-driving tools and flows • How do we overcome “small, expensive data” challenges? • Standards : Learning comes from {design + tool + technology}, all of which are highly proprietary • Need mechanisms for IP-preserving sharing of data and models 21

  22. 4 Stages of ML to Recover Time, Effort Four Stages of Machine Learning 1. Mechanization and Automation 2. Orchestration of Search and Optimization 3. Pruning via Predictors and Models 4. From Reinforcement Learning Huge space of tool, command, option trajectories through design through Intelligence flow 22

  23. Stage 3. Modeling and Prediction • Prediction of tool- and design-specific outcomes over longer and longer subflows • Wiggling of longer and longer ropes • Enables pruning and termination  avoid wasted design resources • Simple way to think about it: “identify doomed X” • Doomed floorplan, Opt run, DRoute run, … • Allocate resources elsewhere • Better outcome within given resource budget • Complementary dream: New heuristics and tools that are inherently more predictable and modelable  lessen chaos • Ensembles might be modeled/predicted • Prediction requirement might be relaxed “get user into a ballpark”? 23

  24. Generic Need: Predicting Doomed Runs • NOTE: “Doomed” often wrt timing, or due to fear of timing!!! • Picture: progressions of #DR violations in commercial router • Simple approach: track and project metrics as time series • Can use Markov decision process (MDP): “GO” vs. “STOP” strategy card to terminate “doomed runs” early 24

  25. Obtaining Golden From Non-Golden ML shifts the Accuracy-Cost Tradeoff Curve (for free) ! 25

  26. DATE14, SLIP15 (Old) Example: ML-based Timer Correlation If INCREMENTAL Outliers error > (data points) threshol d New Designs MODELS (Path slack, setup Train Validate Test time, stage, cell, wire delays) Artificial Real Circuits Designs ONE-TIME AFTER BEFORE 0.1 T 2 Path Slack (ns) 0 T 2 Path Slack (ns) -0.1 -0.2 31 ps ML -0.3 Modeling ~4 � reduction -0.4 123 ps -0.5 -0.6 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 T 1 Path Slack (ns) T 1 Path Slack (ns) 26

  27. ICCD18 Lately: Predicting PBA from GBA • PBA (Path-Based Analysis) is less pessimistic than GBA (Graph-Based Analysis) • But, can have MUCH more expensive runtime ! • ML task: Predict PBA timing from GBA timing •  Improved quality of results in P&R, optimization •  Less-expensive timing analysis usable earlier in flow GBA Mode PBA Mode 27

  28. Bigram- and CART-based Modeling • Bigram-based path modeling • Classification and regression tree (CART) approach • Model based on 13 bigram parameters Reduced GBA pessimism vs. PBA https://vlsicad.ucsd.edu/Publications/Conferences/361/c361.pdf 28

  29. DATE19 Lately: Reduce #Corners in STA and Opt • Want all the benefits of STA at N corners , but want to pay for analysis at only M << N corners • “Missing Corner Prediction” (“matrix completion”) saves runtime, licenses • “Primary corners” methodology  errors caught at signoff cause iteration 29

  30. “Missing Corners” = Matrix Completion Predicting missing delay values = matrix completion problem STA at relatively few known corners  reasonably accurate prediction of timing at all unknown corners PCA: low-dimensional modeling problem 30

  31. Recent: Strong Design-Independent Models Trained using Trained using Error initial artificial testcases richer artificial testcases 10X improvement !! # Corners megaboom (990K instances, 350K FF) 31

  32. Recent: “ML-LEAK” (leakage recovery predictor) • ML to predict how much leakage will be recovered if user runs {Tweaker, Tempus ECO, PTSI ECO, homegrown script, …} • Gives expectation of post-recovery power • Beneficial to methodology team when trying out various DOEs. • Saves time for implementation team: skip leakage recovery if it won’t help • Blended model of design and instance level predictions gives best results. Power recovered in this design was 0.076%. Our Plot showing actual vs predicted percentage change in leakage model predicts 1% power recovery for this graph power after recovery 32

Recommend


More recommend