discrete buffer and wire sizing for discrete buffer and
play

Discrete Buffer and Wire Sizing for Discrete Buffer and Wire Sizing - PowerPoint PPT Presentation

Discrete Buffer and Wire Sizing for Discrete Buffer and Wire Sizing for Link-Based Non-Tree Clock Network Link-Based Non-Tree Clock Network Link Based Non Tree Clock Network Link Based Non Tree Clock Network Rupak Samanta, Jiang Hu and Peng Li


  1. Discrete Buffer and Wire Sizing for Discrete Buffer and Wire Sizing for Link-Based Non-Tree Clock Network Link-Based Non-Tree Clock Network Link Based Non Tree Clock Network Link Based Non Tree Clock Network Rupak Samanta, Jiang Hu and Peng Li Department of Electrical and Computer Engineering p p g g Texas A&M University

  2. Outline Outline � Introduction � Preliminary of Support Vector Machine P li i f S t V t M hi � Sizing Algorithm � Experimental Results � Conclusion 2

  3. Challenges Challenges Power constraint PVT PVT variations < 65nm 90nm 130nm 3

  4. Challenges in Clock Network Challenges in Clock Network � Clock network is a sub-circuit that involves both the challenges challenges – variability and power consumption variability and power consumption � A well known approach for skew tolerance to variation is clock mesh – large wire/power overhead � Link based non tree clock network provides trade off � Link-based non-tree clock network provides trade-off between variation tolerance and power overhead 4

  5. Our Objective Our Objective “To investigate optimizing link-based clock network through discrete buffer and wire sizing” g g 5

  6. Review of Link Insertion Review of Link Insertion � A. Rajaram, J. Hu and R. Mahapatra, DAC04 � Add links such that skew due to variations is reduced AND nominal skew is unaffected � Link = link_capacitors + link_resistor R u u W C/2 C/2 w 6

  7. Link Insertion for Buffered Clock Tree Link Insertion for Buffered Clock Tree � G. Venkataraman et.al. ICCAD05 Links need to be inserted between different sub-networks 7

  8. Motivation Motivation � There is almost no work on optimizing clock network with cross links � Most of the previous work on buffer and wire sizing � Elmore delay model � continuous sizing � Elmore delay model is inaccurate and differs by large amount when compared with SPICE t h d ith SPICE � The number of buffer and wire options are small, rounding continuous sizing result in significant errors rounding continuous sizing result in significant errors 8

  9. Our Contributions Our Contributions � Support Vector Machine (SVM) is explored � handle the complex delay model issue h dl th l d l d l i � provide guidance for discrete optimization in large design space design space � Proposed a two stage hybrid optimization approach h � Discrete sizing � Using accurate delay model U i t d l d l 9

  10. Support Vector Machine(SVM) Support Vector Machine(SVM) � SVM is well suited for highly nonlinear and high-dimensional data data � SVM can operate in different modes � Classification � Regression � Ranking � For a set of M training data set (x 1 , y 1 ) …….…(x m , y m ), the regression model f(x) = Σ α i K(s i , x) � The kernel function K(s i , x) corresponds to a dot product in certain feature space 10

  11. Skew Quality Function Skew Quality Function Q ∑ = ∑ − 2 Q ( ( t t ) ) i i j j � t i is clock delay at leaf node i � An overall function that penalize large skew � Clock delay is obtained through SPICE simulations y g � The data is applied to train SVM model of Q � Once an SVM model is built it is utilized repeatedly then � Once an SVM model is built, it is utilized repeatedly, then the training cost is amortized 11

  12. Sizing Algorithm Sizing Algorithm � The goal of the buffer and the wire sizing is to minimize the global skew the global skew � Our approach is to iteratively optimize a portion of the given clock network given clock network � Compared to simultaneously optimizing the entire network our approach is more practical � Compared to iteratively optimizing a single element, our approach is more efficient on finding a global solution 12

  13. Buffered Clock Tree with Cross links Buffered Clock Tree with Cross links u u Level i v segment Level subtree subtree i+1 1 13

  14. Optimization Flow Optimization Flow Input: Clock Network p : definition of component d fi iti f t e : types of element to be sized Remove Link Resistors k : #components in optimization ε : optimization engine stage1 Optimization Core1( p 1 , e 1 ,k 1 , ε 1 ) 1. Run SPICE simulation 2. S <= k components p 2 S < k t Optimization Core2( p 1 , e 1,2 ,k 2 , ε 2 ) associated with max delay S <= S U k components p associated with min delay Add back Link Resistors Add back Link Resistors 3. Build SVM over S 4. Size element e in S using ε stage2 Optimization Core1( p 2 , e 2 ,k 1 , ε 2 ) p 1 : subtree p 2 : subtree + links Optimization Core2( p 2 , e 2,2 ,k 2 , ε 2 ) e 1 : buffers + wires e 1 bu e s es e 1,2 : wires 14

  15. Optmization Stage I Optmization Stage I Link Resistors are removed, Link capacitance retained STEP 2 STEP 2 STEP 1 STEP 1 � � Step 2 is on fine-grained Step 2 is on fine grained � � Step 1 is done on a coarse Step 1 is done on a coarse and local level level � Individual wires in each sub- � Each wire segment in a sub- tree are sized differently tree is sized uniformly � For each sub-tree the � Only one variable is needed number of variables is for the wire usually large � This is done to reduce the � � The number of sub trees The number of sub-trees total number of variables in t t l b f i bl i chosen for optimization are each sub-tree smaller than Step 1 � More sub-trees can be chosen for optimization chosen for optimization 15

  16. Optimization Engine for Stage I Optimization Engine for Stage I Input: Set of Components to be sized τ is error tolerance 1. While ( improve ) { 2. Partition S into a set G of m groups δ − τ ≤ Δ ≤ δ + τ t , ∀ leaf node i l 3 3. Obtain average leaf node delay t Obtain average leaf node delay t avg of of ∑ , = each group ∀ x 1 , buffer i i b ∀ b 4. Sort groups in G in non-decreasing ∑ ∑ order of t avg = ∀ wire i y 1 , avg j , j , w ∀ w 5. For i = 1 to m/2 { , ∈ ∀ x { 0 , 1 } i,b 6. while ( improve ) i b ILP ∈ ∀ j, j,w y y { { 0 , , 1 } } 7. 7 Increase t avg of g i in G by δ Increase t of g in G by δ j j , w w 8. while ( improve ) 9. Decrease t avg of g m-i+1 in G by δ g 10. } 11. } 16

  17. Optimization Stage II Optimization Stage II Link Resistors added back The clock network topology becomes non-tree The clock network topology becomes non tree � � � Similar to stage I, this stage consists of two steps of optimizations Since network topology is non-tree, its not Since network topology is non-tree its not � � friendly to ILP formulation � The optimization engine is designed using a group migration heuristics � Similar to stage I, the objective is to minimize the skew cost 17

  18. Optimization Engine for Stage II Optimization Engine for Stage II Input: Set S of Component to be sized 1. While (true) { 2. S ’ S While S ’ is not empty { 3 3. While S is not empty { 4. Find move i with max gain g i = − g Q Q − 1 i i i 5. e i = the element sized in move i S’ S ’ – { e i }} 6. l = 7. Find l such that cumulated gain G l is ∑ G g i l maximized 1 8. If G l > 0 make the l moves on S 9. Else break } 18

  19. Experimental Setup Experimental Setup � The experiments are performed on ISCAS89 sequential benchmark circuits. � The circuits are synthesized using SIS and placed in mPL � The clock tree construction and link insertion is done according to g paper G. Venkataraman et.al [ICCAD05]. Case # of Sinks # of Buffers # of Links S9234 135 20 21 S5378 S5378 164 164 25 25 30 30 S13207 503 77 69 S15850 S15850 566 566 81 81 86 86 S38584 1428 235 50 S35932 S35932 1728 1728 286 286 143 143 19

  20. Experimental Setup… Contd Experimental Setup… Contd Model Vdd Buffer Wire K 1 K 2 Library y Library y 90nm 1.0 V 16X, 24X, 1X, 2X, 10-15 4-5 BPTM 32X, 48X 3X X: size of minimum width buffer (wire) � The buffer and wire sizing algorithm is implemented in C. � The Integer Linear Program is solved using a public domain solver called GLPK [http://www.gnu.org/software/glpk/] � The binaries for the Support Vector Machine is downloaded f from [http://svmlight.joachims.org/] [htt // li ht j hi /] � The experiments are performed using 2 Dual-Core Intel Xeon Processor of 3 2 Ghz and 8Gb of memory Processor of 3.2 Ghz and 8Gb of memory 20

  21. Comparison between SPICE, SVM and Comparison between SPICE, SVM and Elmore delay Elmore delay 21

  22. Experimental Results Experimental Results � The experiments are done in SPICE to compare skew and power power � We compare three approaches for the non-tree clock network � Tree+ Link � Tree+ Link+ Sizing (wo SVM) � Tree+ Link+ Sizing (w SVM) � Our approach is also suitable for optimizing Clock tree network. So we simulated clock tree for the three approaches � Tree � Tree � Tree + Sizing (wo SVM) � Tree + Sizing (w SVM) 22

  23. Global Skew for Non-tree Clock Network Global Skew for Non-tree Clock Network Result for Normalized Global Skew 1.2 1 obal Skew 0.8 ormalized Glo 0.6 0.4 No 0.2 0 s9234 s5378 s13207 s15850 s38584 s35932 Average Test Cases Tree + Link Tree + Link + Sizing (wo SVM) Tree + Link + Sizing (w SVM) 23

Recommend


More recommend