Large-Scale Circuit Placement: The Gap and Promise Jason Cong Computer Science Department University of California, Los Angeles cong@cs.ucla.edu Contributors: Chin-Chih Chang, Kenton Sze, Tim Kong, Michail Romesis, Joe Shinnerl, Min Xie, Xin Yuan
Outline � Optimality and scalability study of placement problem -- gap analysis � Research on multilevel large-scale placement problem � Our research plan 2
Optimality and Scalability Study--- Motivation � Lack of significant progress in wirelength reduction � Rate of reduction is about 5-10% every 2-3 years � Latest developments in placement differ mainly in runtime � Where do we stand � How much room for further improvement? � Will existing placement engines scale well to 10+M gate designs? � Need to quantify the optimality and scalability of state-of-the-art placement engines 3
Optimality and Scalability Study--- Motivation (II) � Most work compare only with existing heuristics � Use real design based benchmarks � ISPD98 [C. Alpert 1998] � Use synthetic benchmarks � circ and gen [M. D. Hutton et al, 1998] � gnl [D. Stroobandt et al, 2000] � Little understanding about the gap from the optimal 4
Optimality and Scalability Study--- Related Work Quantified Suboptimality of VLSI � Layout Heuristics [L. Hagen et al, 1995] Construct scaled instance with � known upperbound ? x Over 10% area suboptimality in � TimberWolf Notable wirelength � suboptimality in GORDIAN-L x x x But test cases are small, the � x x x largest netlist is less than 40K x x x 5
Our Contribution: Placement Example Construction with Known Optimal Wirelength � Optimality and Scalability Study of Existing Placement Algorithms [C. Chang et al, 2003] Construct instances with ? � known optimal using the characteristic of the original problem Studied the optimality and � scalability of existing algorithms on constructed instances 6
Construction of Placement Examples with Known Optimal Wirelength (PEKO Examples) � Input � Desired number of placeable modules t � Net Distribution Vector (NDV) D = ( d 2 , d 3 , … d p ), d k is the # of k -pin nets in the circuit t and D are extracted from a real circuit � Output � Cell library L � Netlist N with known optimal wirelength � Constraint � N has D as its NDV 7
Our Algorithm for Constructing PEKO Examples � All the modules are of equal size, and there is no space between rows and adjacent modules � For 2-pin nets , connect any two adjacent modules � For each n -pin net , connect the n modules in a rectangular region close to a square, i.e., the length of each side is close to sqrt( n ) � The wirelength is of each n -pin net is given by + − n n / n 2 8
Illustration: PEKO Example Construction Input : t = 64, D = {d 2 =34,d 3 =20,d 4 =7,d 5 =4,d 6 =2, d 7 =1} #2-pin nets = 34, WL = 34 #3-pin nets = 20, WL = 40 #4-pin nets = 7, WL= 14 #5-pin nets = 4, WL = 12 #6-pin nets = 2, WL = 6 #7-pin nets = 1, WL = 4 Total WL = 110 • Method first conceived by K. Boese (1995), but not implemented 9
White Space Insertion � Need for white space � mimic real designs � Ease for legalization Option 1: expanding one dimension Option 2: removing some of the of the chip 10 nets
Four New Suites of Placement Examples with Known Optimal Wirelength � Module number t and NDV extracted from ISPD98 [C. Alpert, 1998] � Two suites without pads (suite1 and suite2) � suite2 is derived by scaling t and NDV by a factor of 10 � Two suites with pads (suite3 and suite4) � suite4 is derived by scaling t and NDV by a factor of 10 � 15% white space by expanding on dimension of the chip URL: http://ballade.cs.ucla.edu/~pubbench/peko.htm 11
PEKO Characteristics PEKO Suite1 ( 12.5k – 210k ) PEKO Suite2 ( 125k – 2.1M ) ckt #cell #net #row Optimal WL ckt #cell #net #row Optimal WL Peko01 12506 13865 113 8.14E+05 Peko01x10 125060 138650 335 8.14E+06 Peko02 19342 19325 140 1.26E+06 Peko02x10 193420 193250 441 1.26E+07 Peko03 22853 27118 152 1.50E+06 Peko03x10 228530 271180 479 1.50E+07 Peko04 27220 31683 166 1.75E+06 Peko04x10 272200 316830 523 1.75E+07 Peko05 28146 27777 169 1.91E+06 Peko05x10 281460 277770 532 1.91E+07 Peko06 32332 34660 181 2.06E+06 Peko06x10 323320 346600 570 2.06E+07 Peko07 45639 47830 215 2.88E+06 Peko07x10 456390 478300 677 2.88E+07 Peko08 51023 50227 227 3.14E+06 Peko08x10 510230 502270 715 3.14E+07 Peko09 53110 60617 231 3.64E+06 Peko09x10 531100 606170 730 3.64E+07 Peko10 68685 74452 263 4.73E+06 Peko10x10 686850 744520 830 4.73E+07 Peko11 70152 81048 266 4.71E+06 Peko11x10 701520 810480 839 4.71E+07 Peko12 70439 76603 266 5.00E+06 Peko12x10 704390 766030 840 5.00E+07 Peko13 83709 99176 290 5.87E+06 Peko13x10 837090 991760 916 5.87E+07 Peko14 147088 152255 385 9.01E+06 Peko14x10 1470880 1522550 1214 9.01E+07 Peko15 161187 186225 402 1.15E+07 Peko15x10 1611870 1862250 1271 1.15E+08 Peko16 182980 189544 429 1.25E+07 Peko16x10 1829800 1895440 1354 1.25E+08 Peko17 184752 188838 431 1.34E+07 Peko17x10 1847520 1888380 1360 1.34E+08 Peko18 210341 201648 460 1.32E+07 Peko18x10 2103410 2016480 1451 1.32E+08 12
Tested four State-of-the-Art Placers � Capo [A. E. Caldwell et al, 2000] � based on multilevel partitioner � aims to enhance the routability � Dragon [M. Wang et al, 2000] � uses hMetis for initial partition � SA with bin-based swapping � mPL [T. Chan et al, 2000] � nonlinear programming on the coarsest level � Goto based relaxation � QPlace [Cadence Inc.] � quadratic programming � component of Silicon Ensemble 13
Experiment with State-of-the-Art Placers Using PEKO Suite1 2.80 45000 2.60 40000 Multiple of Optim al 2.40 35000 30000 2.20 runtime(s) 25000 2.00 1.80 20000 15000 1.60 1.40 10000 1.20 5000 1.00 0 0 50000 100000 150000 200000 250000 0 50000 100000 150000 200000 250000 #cells #cells Dragon v.2.20 capo v.8.0 mPL v.1.2 qplace v.5.1.55 Dragon v.2.20 capo v.8.0 mPL v.1.2 qplace v.5.1.55 � Existing algorithms are 66-153% away from the optimal on PEKO � On examples with pads � mPL and QPlace show improvement of 12% and 10% respectively � Dragon and Capo do not benefit much from the additional information � There is significant room for improvement in placement algorithms! 14
Experiment with State-of-the-Art Placers Using PEKO Suite1 & Suite2 60000 2.80 2.60 50000 2.40 Multiple of Optimal 2.20 40000 2.00 runtime(s) 30000 1.80 1.60 20000 1.40 1.20 10000 1.00 0 10000 100000 1000000 10000000 10000 100000 1000000 10000000 #cells #cells Dragon v.2.0 capo v.8.0 mPL v.1.2 qplace v.5.1.55 Dragon v.2.20 capo v.8.0 mPL v.1.2 qplace v.5.11.55 � Capo, QPlace and mPL scales well in runtime � Average solution quality of each tool shows deterioration by an additional 4% to 25% when the problem size increases by a factor of 10 � QoR of the existing placement algorithms can be 80% - 180% away from the optimal for large designs 15
16
Limitation of PEKO Examples � Optimal solution includes local nets only � Unlikely for real designs � Measure wirelength only � Timing and routability are important objectives for placement algorithms as well 17
Impact of Global Connections in Real Examples WL of WL contribution � Produced by Dragon circuit height width longest net of longest 10% ibm01 8158 4530 7148 51% on ISPD98 ibm02 8158 6430 14224 46% ibm03 8158 6740 10624 58% � The wirelength ibm04 8158 9140 15171 53% contribution from ibm05 8158 11055 19064 47% ibm06 8158 8715 13966 61% global connections ibm07 8158 14605 14051 51% ibm08 8158 15895 16142 60% can be significant! ibm09 8158 16395 13780 55% ibm10 8158 27890 30755 53% � Need to consider the ibm11 16350 10925 19234 59% ibm12 16350 15545 26748 52% impact of global ibm13 16350 12230 19539 59% ibm14 16350 25475 26370 61% connections ibm15 16350 23785 27284 63% ibm16 16350 34015 42860 59% ibm17 16283 38895 45686 56% 18 ibm18 16350 37065 52846 64%
Placement Examples with Known Upperbounds (PEKU) � Extend PEKO by introducing non-local nets to mimic global connections � All the modules are of equal size, and there is no space between rows and adjacent modules � For nets of degree i i , a subset of them are generated by randomly connecting i i modules, the rest are generated optimally as in PEKO 19
Placement Examples with Known Upperbounds (PEKU) Input : t = 64, D = {d 2 =34,d 3 =20,d 4 =7,d 5 =4,d 6 =2, d 7 =1} α =0.2 Generate 28 2-pin optimally Generate 6 2-pin randomly Generate 16 3-pin optimally Generate 4 3-pin randomly Generate 6 4-pin optimally Generate 1 4-pin randomly Generate 4 5-pin optimally Generate 2 6-pin optimally Generate 1 7-pin optimally Total WL = 160 20
Placement Examples with Global Connections only (G-PEKU) Input : t = 64 � Each net connects either a row or column � Obvious upper bound � Sum the length of each row and column � Similar to datapath examples 21
PEKU Suite � Module numbers and NDV s extracted from ISPD98 � Remove connections with pads � Vary α from 0 to 10% � 15% white space by expanding one dimension of the chip 22
Recommend
More recommend