A Learning Bridge from Architectural Synthesis to Physical Design - PowerPoint PPT Presentation

A Learning Bridge from Architectural Synthesis to Physical Design for Exploring Power Efficient High-Performance Adders Subhendu Roy 1 Yuzhe Ma 2 Jin Miao 1 Bei Yu 2 1 Cadence Design Systems 2 The Chinese University of Hong Kong ISLPED’17 1 / 23

Optimality across EDA stages Logic Synthesis Architectural Physical Synthesis Design No 1-1 mapping between metrics across various EDA stages. ◮ Optimality at one stage doesn’t guarantee the same in another stage ◮ Data-driven methodology, such as machine learning, becomes imminent ISLPED’17 2 / 23

Binary Adder Design ◮ Primary building blocks in the datapath logic of a microprocessor ◮ A fundamental problem in VLSI industry for last several decades What is still unsolved? Closing the gap across adder design stages ISLPED’17 3 / 23

Parallel Prefix Adders Parallel Prefix Adders → Flexible delay-power trade-off Regular Adders → Sub-optimal Custom Adders → High TAT ISLPED’17 4 / 23

Parallel Prefix Adders Parallel Prefix Adders → Flexible delay-power trade-off Regular Adders → Sub-optimal Custom Adders → High TAT This Work: Automatic Cumtom Adders ISLPED’17 4 / 23

Architectural Level: Mapped to Prefix Structures a 0 b 0 a 1 b 1 a 7 b 7 Pre-processing g 7 ,p 7 g 1 ,p 1 g 0 ,p 0 Parallel Prefix Structure c 1 c 0 a 7 b 7 a 2 b 2 a 1 b 1 a 0 b 0 c 6 C out = c 7 Post-processing s 7 s 2 s 1 s 0 ISLPED’17 5 / 23

Prefix Graph Problem Carry-computation can be mapped to prefix graph problem y i = x i − 1 o x i − 1 o x i − 2 o . . . x 1 o x 0 x 5 x 4 x 3 x 2 x 1 x 0 Size (s) = No. of prefix nodes = 7 Level (L) = maximum logic level = 3 Max-Fanout (mfo) = 2 y 5 ISLPED’17 6 / 23

Classifying Prefix Graph Synthesis Can be classified based on the solution# Category 1: Limited number of solutions ◮ Example: [Matsunaga+,GLSVLSI’07], [Liu+,ICCAD’03], [Zhu+,ASPDAC’05], [Roy+,ASPDAC’15] - Not suitable for exploring data-driven methodologies - No analytical model to physical design stage Category 2: Innumerable solutions ◮ Example: [Roy+,TCAD’14] - Not scalable for bounded fan-out - Computationally expensive to run all solutions through full physical design flow ISLPED’17 7 / 23

Gap between Prefix Structure and Physical Design 240 2300 G1 G1 230 2200 G2 G2 Area ( µ m 2 ) Node Size 220 2100 210 2000 200 1900 190 1800 180 1700 5 10 15 20 25 30 35 0.34 0.36 0.38 0.4 Max Fanout Critical Delay (ns) (a) (b) (a) Architectural solution space; (b) Physical design space. ◮ G1 (less fan-out and high size); G2 (high fan-out and low size) ◮ When mapped to physical solution space - Correlation between size and area - Not completely reliable, G1 and G2 get mixed up in physical solution space ISLPED’17 8 / 23

Gap between Prefix Structure and Physical Design 240 2300 G1 G1 230 2200 G2 G2 Area ( µ m 2 ) Node Size 220 2100 210 2000 200 1900 190 1800 180 1700 5 10 15 20 25 30 35 0.34 0.36 0.38 0.4 Max Fanout Critical Delay (ns) (a) (b) (a) Architectural solution space; (b) Physical design space. ◮ G1 (less fan-out and high size); G2 (high fan-out and low size) ◮ When mapped to physical solution space - Correlation between size and area - Not completely reliable, G1 and G2 get mixed up in physical solution space What We Want to Search For: All Pareto Frontier points with low area, low power, and low critical delay. ISLPED’17 8 / 23

Task 1: Prefix Adder Solution Exploration 8000 TCAD‘14 7500 Power ( µ w) 7000 6500 6000 320 340 360 380 400 420 Critical Delay (ps) ISLPED’17 9 / 23

[Roy+,TCAD’14]– Summary G 2 G 3 G 4 G 3 G n+1 G n ◮ G n = set of prefix graphs of bit-width n ◮ Prefix graphs of higher order generated in bottom-up fashion ◮ Several pruning strategies during G n → G n + 1 for scaling - For bounded fan-out, these strategies compromises in size-optimality ISLPED’17 10 / 23

Enhancement 1: Imposing Semi-regularity ◮ The concept is derived from regular adders such as Brent-Kung, Sklansky. ◮ x i and x i + 1 combined to form prefix nodes, where i is even. ◮ This regularity for only L = 1 ◮ For L > 1 , regularity compromises size optimality (Forbidden). ◮ Observation: this semi-regularity doesn’t degrade size-optimality. x 7 x 6 x 5 x 4 x 0 x 3 x 2 x 1 ISLPED’17 11 / 23

Enhancement 2: Level restriction in Non-trivial Fan-in ◮ Trivial fan-in having same MSB ◮ x 4 and i 1 are trivial and non-trivial fan-in of i 2 ◮ Level (non-trivial fan-in) ≥ level (trivial fan-in) ◮ Reduces search space without degrading size-optimality x 5 x 5 x 4 x 3 x 2 x 1 x 0 i 1 i 2 y 5 ISLPED’17 12 / 23

Comparison at Prefix Graph Stage Our Approach [Roy+,TCAD’14] mfo size Run-time (s) size Run-time (s) 4 244 302 252 241 6 233 264 238 212 8 222 423 - - 12 201 193 - - 16 191 73 192 149 32 185 0.04 185 0.04 ◮ Table is for 64 bit adders ◮ [Roy+,TCAD’14] cannot get solutions for all fanouts. ◮ Our solutions are always more size-optimal. ◮ Runtimes are comparable, adder synthesis is one-time. ISLPED’17 13 / 23

Physical Solution Space Comparison 8000 TCAD‘14 7500 Power ( µ w) 7000 6500 6000 320 340 360 380 400 420 Critical Delay (ps) Our solutions cover wider space in physical domain ◮ 7000 random samples from [Roy+,TCAD’14] vs. 3000 samples from us ◮ Reason: TCAD’14 misses solutions for bounded fanout in a few cases ISLPED’17 14 / 23

Physical Solution Space Comparison 8000 TCAD‘14 Ours 7500 Power ( µ w) 7000 6500 6000 320 340 360 380 400 420 Critical Delay (ps) Our solutions cover wider space in physical domain ◮ 7000 random samples from [Roy+,TCAD’14] vs. 3000 samples from us ◮ Reason: TCAD’14 misses solutions for bounded fanout in a few cases ISLPED’17 14 / 23

Task 2: Pareto Frontier Driven Learning 8000 Real PF Rep. Adder 7500 Power( µ w) 7000 6500 6000 340 360 380 400 420 440 Critical Delay(ps) ISLPED’17 15 / 23

Quasi-Random Data Sampling ◮ Hundreds of thousands of solutions ◮ How to choose training data? - Cannot run too many architectures as physical design flow costly. - Too few will degrade model accuracy. Quasi-Random Sampling Create architectural bins based on mfo and s . ◮ Capture all architectural bins ◮ Select solutions from each bin randomly Bin of solutions with s=246 and mfo=4 s=245 s=246 s=244 mfo=4 mfo=6 s=233 s=234 s=235 ISLPED’17 16 / 23

Feature Selection and Learning Model ◮ Architectural attributes: s , mfo , sum-path-fanout ( spfo ) ◮ Tool settings: Target delay ◮ Best model fitting by support-vector-regression (SVR) with RBF kernel ◮ Including spfo improves MSE score for delay from 0.232 to 0.164 ◮ Note: linear models not sufficient for modeling delay x 2 x 1 x 0 x 3 spfo ( y 1 ) = spfo ( x 0 ) + spfo ( x 1 ) + fo ( x 0 ) + fo ( x 1 ) = 0 + 0 + 1 + 1 = 2 spfo ( i 1 ) = spfo ( x 3 ) + spfo ( x 2 ) + fo ( x 3 ) + fo ( x 2 ) = i 1 0 + 0 + 1 + 2 = 3 y 1 spfo ( y 3 ) = spfo ( i 1 ) + spfo ( y 1 ) + fo ( i 1 ) + fo ( y 1 ) = 3 + 2 + 1 + 2 = 8 ISLPED’17 17 / 23

Pareto Frontier Driven Learning ◮ Conventional learning focusses on prediction accuracy - Model accuracy improvement doesn’t guarantee Pareto-frontier improvement - Need for learning integrated Pareto-frontier exploration ◮ Scalarization or α -sweep - Learning output is a linear sum of delay and power ( α × Power + Delay) - Model-fitting done with different values of alpha - Sweeping alpha from 0 to a large positive number ISLPED’17 18 / 23

Experimental Setup Synthesis and placement/routing of adders ◮ Tools: Design Compiler/ IC Compiler ◮ Library: Non-linear-delay-model (NLDM) in 32nm SAED cell-library ◮ Tool settings: Target delay = 0.1ns, 0.2ns, 0.3 ns Programming Language ◮ C++ for prefix adder synthesis ◮ Python based machine learning package scikit-learn Machine Configurations ◮ 72GB RAM UNIX machine ◮ 2.8GHz CPU ISLPED’17 19 / 23

Pareto-frontier Comparison 8000 Real PF Predicted PF Rep. Adder 7500 Power( µ w) 7000 6500 6000 340 360 380 400 420 440 Critical Delay(ps) Predicted pareto-frontier almost matches actual pareto-frontier ◮ Training set is randomly selected from 300 samples. ◮ Rep. adders are quasi-random sampled from other 3000 samples ◮ Predicted frontier is from best 150 solutions (predicted) ISLPED’17 20 / 23

Pareto-frontier Comparison 8000 2300 Real PF Real PF Predicted PF Predicted PF 2200 Rep. Adder Rep. Adder 7500 Power( µ w) Area( µ m 2 ) 2100 7000 2000 6500 1900 1800 6000 340 360 380 400 420 440 340 360 380 400 420 Critical Delay(ps) Critical Delay(ps) Predicted pareto-frontier almost matches actual pareto-frontier ◮ Training set is randomly selected from 300 samples. ◮ Rep. adders are quasi-random sampled from other 3000 samples ◮ Predicted frontier is from best 150 solutions (predicted) ISLPED’17 20 / 23

Comparison with Other Adders Pareto-points derived from our approach beats other solutions in all metrics (delay, area, power) Area ( µ m 2 ) Method Delay (ps) Power ( mW ) Kogge-Stone 347.9 2563.7 8.78 Ours ( P 1 ) 340.0 2203.3 7.72 Sklansky 356.1 1792.5 6.1 Ours ( P 2 ) 353.0 1753.0 5.9 [Roy+,ASPDAC’15] 348.7 1971.4 6.98 Ours ( P 3 ) 346.0 1848.6 6.67 ISLPED’17 21 / 23

A Learning Bridge from Architectural Synthesis to Physical Design - PowerPoint PPT Presentation

A Learning Bridge from Architectural Synthesis to Physical Design for Exploring Power Efficient High-Performance Adders Subhendu Roy 1 Yuzhe Ma 2 Jin Miao 1 Bei Yu 2 1 Cadence Design Systems 2 The Chinese University of Hong Kong ISLPED17 1 /

OBAMA PRESIDENTIAL CENTER INTRODUCTION 2 INTRODUCTION 3 ARCHITECTURAL DESIGN 4 ARCHITECTURAL

Religious Architectural Religious Architectural Religious Architectural Religious Architectural

Architectural Resources Cambridge Architectural Resources Cambridge Architectural Resources

SYNTHESIS OF SUPER SYNTHESIS OF SUPER NANOPOROUS SYNTHESIS OF SUPER SYNTHESIS OF

Indi diana na Bridge L Load R Rating Jeremy Hunter INDOT Bridge Design Manager Indiana

Bridge Design Introduction 13.02.2020 ETH Zurich | Chair of Concrete Structures and Bridge

NES Architectural Ltd http://www.nes-solutions.co.uk/architectural Who Are we? NES Architectural

Total Synthesis of the Polycyclic Total Synthesis of the Polycyclic Total Synthesis of the

Chemical Synthesis Techniques Chemical Synthesis Techniques Chemical Synthesis Techniques

Bridge Rehabilitation/Replacement Needs Redbud Trail Bridge Barton Springs Road Bridge Presentation

5 th Street SE Bridge Overview Bridge is functionally obsolete New bridge funded with

Bridge Beam E In Posi5on Bridge Beam A At TCO 1 APA#1 is loaded on Bridge Beam A At TCO 2

Co-synthesis techniques for embedded systems embedded systems Kelvin Yuk June 5, 2002 EEC282 -

Basics Architectural Presentation Basics Architectural Presentation Filesize: 6.51 MB Reviews

Idealised Fault Tolerant Idealised Fault Tolerant Architectural Element Architectural Element

Banking software architecture 2 Architectural Styles 1 WebLogic Network Gatekeeper's software

Symbolic Computation and Theorem Proving in Program Analysis Laura Kov acs Chalmers Outline

Shell Model Far From Stability: IoI Mergers Fr ed eric Nowacki NUSPIN 2017, June 26 th -29

Data Streams & Communication Complexity Lecture 2: Graph Spanners, Sparsifiers, & Sketches

Simple High-Level Code For Cryptographic Arithmetic With Proofs, Without Compromises Andres

Modulo- Parallel Prefix Addition via Excess-Modulo Encoding of

CMPSC 497: Midterm Review Trent Jaeger Systems and Internet Infrastructure Security (SIIS)

On a New Proof of the Faber-Manteuffel Theorem Petr Tich joint work with Jrg Liesen and

36th European Workshop on Computational Geometry Disjoint tree-compatible plane perfect

A Learning Bridge from Architectural Synthesis to Physical Design - PowerPoint PPT Presentation

A Learning Bridge from Architectural Synthesis to Physical Design for Exploring Power Efficient High-Performance Adders Subhendu Roy 1 Yuzhe Ma 2 Jin Miao 1 Bei Yu 2 1 Cadence Design Systems 2 The Chinese University of Hong Kong ISLPED17 1 /

OBAMA PRESIDENTIAL CENTER INTRODUCTION 2 INTRODUCTION 3 ARCHITECTURAL DESIGN 4 ARCHITECTURAL

Religious Architectural Religious Architectural Religious Architectural Religious Architectural

Architectural Resources Cambridge Architectural Resources Cambridge Architectural Resources

SYNTHESIS OF SUPER SYNTHESIS OF SUPER NANOPOROUS SYNTHESIS OF SUPER SYNTHESIS OF

Indi diana na Bridge L Load R Rating Jeremy Hunter INDOT Bridge Design Manager Indiana

Bridge Design Introduction 13.02.2020 ETH Zurich | Chair of Concrete Structures and Bridge

NES Architectural Ltd http://www.nes-solutions.co.uk/architectural Who Are we? NES Architectural

Total Synthesis of the Polycyclic Total Synthesis of the Polycyclic Total Synthesis of the

Chemical Synthesis Techniques Chemical Synthesis Techniques Chemical Synthesis Techniques

Bridge Rehabilitation/Replacement Needs Redbud Trail Bridge Barton Springs Road Bridge Presentation

5 th Street SE Bridge Overview Bridge is functionally obsolete New bridge funded with

Bridge Beam E In Posi5on Bridge Beam A At TCO 1 APA#1 is loaded on Bridge Beam A At TCO 2

Co-synthesis techniques for embedded systems embedded systems Kelvin Yuk June 5, 2002 EEC282 -

Basics Architectural Presentation Basics Architectural Presentation Filesize: 6.51 MB Reviews

Idealised Fault Tolerant Idealised Fault Tolerant Architectural Element Architectural Element

Banking software architecture 2 Architectural Styles 1 WebLogic Network Gatekeeper's software

Symbolic Computation and Theorem Proving in Program Analysis Laura Kov acs Chalmers Outline

Shell Model Far From Stability: IoI Mergers Fr ed eric Nowacki NUSPIN 2017, June 26 th -29

Data Streams &amp; Communication Complexity Lecture 2: Graph Spanners, Sparsifiers, &amp; Sketches

Simple High-Level Code For Cryptographic Arithmetic With Proofs, Without Compromises Andres

Modulo- Parallel Prefix Addition via Excess-Modulo Encoding of

CMPSC 497: Midterm Review Trent Jaeger Systems and Internet Infrastructure Security (SIIS)

On a New Proof of the Faber-Manteuffel Theorem Petr Tich joint work with Jrg Liesen and

36th European Workshop on Computational Geometry Disjoint tree-compatible plane perfect

Data Streams & Communication Complexity Lecture 2: Graph Spanners, Sparsifiers, & Sketches