grid grid to grid grid to to ports clock routing for to
play

Grid Grid to Grid Grid-to to Ports Clock Routing for to-Ports - PowerPoint PPT Presentation

Grid Grid to Grid Grid-to to Ports Clock Routing for to-Ports Clock Routing for Ports Clock Routing for Ports Clock Routing for High Performance High Performance Microprocessor Designs Microprocessor Designs Haitong Tian # , Wai-Chung


  1. Grid Grid to Grid Grid-to to Ports Clock Routing for to-Ports Clock Routing for Ports Clock Routing for Ports Clock Routing for High Performance High Performance Microprocessor Designs Microprocessor Designs Haitong Tian # , Wai-Chung Tang # , Evangeline F.Y. Young # and C.N. Sze * # Department of Computer Science and Engineering The Chinese University of Hong Kong * IBM Austin Research Laboratory IBM Austin Research Laboratory ISPD ’11, Santa Barbara , USA March 28, 2011

  2. Outline  Introduction  Problem Formulation  Routing Algorithm  Experimental Results E i l R l  Conclusion  Conclusion I SPD 2011 2

  3. Clock Distribution Categories  Clock distribution is an very important issue  Buffered and unbuffered trees  Used in various ASICs  Supported by many physical design tools  See Tsay TCAD’93, Xi DAC’95  Non-tree structure with crosslinks  Intended for reducing clock skews  See Rajaram DAC’04, TCAD’06 See aja a C 0 , C 06  Grid and buffered trees  High performance processors  Sometimes manually design the clock structures  Sometimes manually design the clock structures  See Shelar ISPD’09, TCAD’10, Guru VLSI Circuits’10 I SPD 2011 3

  4. High Performance Clock Distribution  Clock network in high Grid buffers performance microprocessors microprocessors Regional Local  Distributed as global grid External Clock Clock Clock ... buffers buffers followed by buffered trees  See Shelar ISPD 09,  See Shelar ISPD’09 PLL PLL ... TCAD’10, Guru VLSI Regional Local Clock Clock Circuits’10 ... buffers buffers  This paper focuses on the post-grid clock distribution area Post-grid Clock Local Clock Grid Bufer Clock Grid Distribution Network Post grid clock distribution I SPD 2011 4

  5. Post-grid Clock Distribution  In our modeling Global grid  Entire chip divided into several layout areas several layout areas Grid Buffer  Each layout area contains Blocks many blocks many blocks Reserved R d Tracks Port  Each block contains Sequential standard cells and/or macros Global Grid Local Clock Buffer  Each layout area contains Layout Region  100s-1000s clock ports p  Grid wires reserved for Reserved clock routing multilayer tracks  Typically upper mental layers layers I SPD 2011 5

  6. Motivations  Clock distribution of microprocessor:  Crucial importance  Major source of power dissipation  High capacitance usage  18 1% f t t l l  18.1% of total clock capacitance [1] [1] k it  See Pham Solid State Circuits’06  Manually design in practice y g p  Hard to satisfy delay/slew constraints  Time to market  S  See Shelar ISPD’09, TCAD’10 Sh l ISPD’09 TCAD’10 [1]: D. Pham, T. Aipperspach, D. Boerstler, M. Bolliger, R. Chaudhry, D. Cox, P. Harvey, P. Harvey, H. Hofstee, C. Johns, et al. Overview of the architecture, circuit design, and physical implementation of a first-generation cell processor. IEEE Journal of Solid-State Circuits, 41(1):179–196, Jan. 2006. I SPD 2011 6

  7. Outline  Introduction  Problem Formulation  Routing Algorithm  Experimental Results  Conclusion I SPD 2011 7

  8. Problem Formulation  Input  A set of reserved tracks  Locations and capacitances of ports P  Different types of wires on each metal layer  Delay limit D . Slew limit S  Output  A clock network (may be non tree structures)  A clock network (may be non-tree structures)  Objective  Connecting every port to the source  Satisfying delay and slew constraints  Minimizing capacitance usage  Minimizing capacitance usage I SPD 2011 8

  9. Post-grid Clock Routing  0 7 7 6 6 Layer Layer 5 5 4 4 3 3 0 0 500 500 1800 1800 1600 1600 1600 1600 1400 1400 1400 1400 1200 1200 1000 1000 1000 1000 800 800 600 600 400 400 200 200 1500 1500 0 0 I SPD 2011 9

  10. Outline  Introduction  Problem Formulation  Routing Algorithm  Experimental Results  Conclusion I SPD 2011 10

  11. Overall Algorithm  Critical ports  Ports with large capacitance or f far away from the source f th  Path expansion algorithm  Elmore-delay driven  Expanding in some selected directions  Post-processing  Wire replacement  Topology refinement  Iterations  The overall algorithm is repeatedly invoked  May fail when number of y iterations > K (user specified) I SPD 2011 11

  12. Delay-driven Path Expansion Algorithm  Basic steps  Simultaneously expand from all ports  Select the path with the minimum Elmore delay to further expand  Connect the ports to the source once the path reaches the source grid  Check delay/slew constraints I SPD 2011 12

  13. A Routing Example  Initially, the heap is empty  First iteration (simultaneously expand from all ports)  Heap={(P 1 ,P 2 );(P 1 ,C 1 );(P 2 ,P 3 );(P 2 ,P 1 );(P 3 ,P 2 );(P 3 ,C 2 )}  Second iteration (P 1 ,P 2 )  Heap={(P 3 ,C 2 );(P 1 ,P 2 ,P 3 );(P 1 ,C 1 );(P 2 ,P 3 );(P 2 ,P 1 );(P 3 ,P 2 )} p {( , );( , , );( , );( , );( , );( , )}  Third iteration (P C )  Third iteration (P 3 ,C 2 )  Heap ={(P 3 ,C 2 ,S 2 );(P 1 ,P 2 ,P 3 );(P 1 ,C 1 );(P 2 ,P 3 );(P 2 ,P 1 );(P 3 ,P 2 )} I SPD 2011 13

  14. A Routing Example  Fourth iteration (identify chain paths)  Heap ={(P 1 ,P 2 ,P 3 );(P 1 ,C 1 );(P 2 ,P 3 );(P 2 ,P 1 )}  Chain path={(P 1 ,P 2 ,P 3 );(P 2 ,P 3 )}  Fifth iteration (P 2 ,P 3 )  Heap={(P 1 ,P 2 ,P 3 );(P 1 ,C 1 )}  Chain path={(P 1 ,P 2 ,P 3 );(P 1 ,P 2 )} p {( , , );( , )}  Sixth iteration (P 1 P 2 )  Sixth iteration (P 1 ,P 2 )  Heap={}, chain path={}  Final result I SPD 2011 14

  15. Post-processing Techniques  Wire replacement  Wire replacement  Port with largest delay: P 5  Two types of wires  Replace edge P 1 C 1  Replace edge P 1 C 1   capacitance/resistance tradeoff it / i t t d ff  Replace edge P 4 C 2  Replace edge P 2 P 3 , P 3 C 1  Procedures  Replace P 5 C 3 , C 3 C 2 , C 2 C 1 , C 1 S 1  Identify port P l with the largest p , , , Identify port P l with the largest Elmore delay  Replace wires in a bottom-up style S 1 S 1 S 1 S 1 S 1  Check delay/slew constrains P 2 P 2 P 2 P 2 P 2 P 3 P 3 P 3 P 3 P 3 C 1 C 1 C 1 C 1 C 1 P 1 P 1 P 1 P 1 P 1 C 3 C 3 C 3 C 3 C 3 C 2 C 2 C 2 C 2 C 2 P 4 P 4 P 4 P 4 P 4 P 5 P 5 P 5 P 5 P 5 I SPD 2011 15

  16. Post-processing Techniques  Topology refinement  Topology refinement  Procedures  Elmore delay:  P 5 >P 4 >P 6 >P 2 >P 1 >P 3 >P 7  Disconnect a port P  Sequentially process all the ports  Expand P towards all directions S 1 S 1 S 1 S 1 P 2 P 3 C 1 C 1 C 1 C 1 P 1 P 1 P 1 P 1 P 2 P 2 P 2 P 3 P 3 P 3  Select paths with smaller capacitance i C 2 C 2 C 2 C 3 C 2 P 4 P 4 P 4 P 4 P 5 P 5 P 5  Check delay/slew constraints P 5 P 6 P 6 P 6 C 4 C 4 C 4 P 6 C 4 C 5 C 5 C 5 P 7 P 7 P 7 C 5 P 7 S 2 S 2 S 2 S 2 I SPD 2011 16

  17. Non-tree Extensions  A small number of ports have  Non-tree extensions exceptionally large capacitances  Connect p to S 1  The delay of its shortest path  The delay of its shortest path  Find a second source S 2 exceeds the delay limit D  Add crosslinks  Procedures  Find a third source S 3  Establish a shortest path for p p p  Add crosslinks  Find a second shortest path  Target delay not met? Add all useful corsslinks  Target delay not met? Do the  Target delay not met? Do the same thing for parent node of p I SPD 2011 17

  18. Outline  Introduction  Problem Formulation  Routing Algorithm  Experimental Results  Conclusion I SPD 2011 18

  19. Experiment Setup  Environment  Implemented in C++  Run on Linux server  Intel Pentium 4 3.2GHz  2GB RAM  Delay setup: 5ps  Slew setup: input: 10ps; output: 15 ps  Benchmarks B h k  3 test cases are provided by industry  11 test cases are from ISPD 2010 Clock Network Synthesis Contest es cases a e o S 0 0 C oc Ne wo Sy es s Co es  Comparisons  Compared with TG, which was proposed by Shelar in ISPD’09, TCAD’10 TCAD’10 I SPD 2011 19

  20. Tree Growing Algorithm  Proposed in R. Shelar ISPD’09,  Tree Growing Algorithm TCAD’10  Expand from the source  D l  Delay/Slew constraints /Sl t i t  Add S 1 C 1 , S 2 C 2  Greedy expansion from the  Add C 2 P 3 source  Add C 1 P 1  Edges with the smallest  Ed ith th ll t  Add P 3 P 2 capacitance will be added into the network S 1 S 1 S 1 S 1 S 1 S 1 S 2 S 2 S 2 S 2 S 2 S 2 C 1 C 1 C 1 C 1 C 1 C 1 C 2 C 2 C 2 C 2 C 2 C 2 P 1 P 1 P 1 P 1 P 1 P 1 P 2 P 2 P 2 P 2 P 2 P 2 P 3 P 3 P 3 P 3 P 3 P 3 I SPD 2011 20

  21. Comparisons: capacitance  Without post-processing: Capacitance (without post-processing techniques) 18 3% i 18.3% improvement t 25000 capacitance (fF) 20000 15000 TG Ours 10000 5000 5000 c 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Test cases  With topology refinement: Capacitance (with topology refinement) 24.6% improvement 25000 ce(fF) 20000 capacitanc 15000 TG Ours 10000 5000 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Test cases I SPD 2011 21

Recommend


More recommend