Grid Grid to Grid Grid-to to Ports Clock Routing for to-Ports - PowerPoint PPT Presentation

Grid Grid to Grid Grid-to to Ports Clock Routing for to-Ports Clock Routing for Ports Clock Routing for Ports Clock Routing for High Performance High Performance Microprocessor Designs Microprocessor Designs Haitong Tian # , Wai-Chung Tang # , Evangeline F.Y. Young # and C.N. Sze * # Department of Computer Science and Engineering The Chinese University of Hong Kong * IBM Austin Research Laboratory IBM Austin Research Laboratory ISPD ’11, Santa Barbara , USA March 28, 2011

Outline  Introduction  Problem Formulation  Routing Algorithm  Experimental Results E i l R l  Conclusion  Conclusion I SPD 2011 2

Clock Distribution Categories  Clock distribution is an very important issue  Buffered and unbuffered trees  Used in various ASICs  Supported by many physical design tools  See Tsay TCAD’93, Xi DAC’95  Non-tree structure with crosslinks  Intended for reducing clock skews  See Rajaram DAC’04, TCAD’06 See aja a C 0 , C 06  Grid and buffered trees  High performance processors  Sometimes manually design the clock structures  Sometimes manually design the clock structures  See Shelar ISPD’09, TCAD’10, Guru VLSI Circuits’10 I SPD 2011 3

High Performance Clock Distribution  Clock network in high Grid buffers performance microprocessors microprocessors Regional Local  Distributed as global grid External Clock Clock Clock ... buffers buffers followed by buffered trees  See Shelar ISPD 09,  See Shelar ISPD’09 PLL PLL ... TCAD’10, Guru VLSI Regional Local Clock Clock Circuits’10 ... buffers buffers  This paper focuses on the post-grid clock distribution area Post-grid Clock Local Clock Grid Bufer Clock Grid Distribution Network Post grid clock distribution I SPD 2011 4

Post-grid Clock Distribution  In our modeling Global grid  Entire chip divided into several layout areas several layout areas Grid Buffer  Each layout area contains Blocks many blocks many blocks Reserved R d Tracks Port  Each block contains Sequential standard cells and/or macros Global Grid Local Clock Buffer  Each layout area contains Layout Region  100s-1000s clock ports p  Grid wires reserved for Reserved clock routing multilayer tracks  Typically upper mental layers layers I SPD 2011 5

Motivations  Clock distribution of microprocessor:  Crucial importance  Major source of power dissipation  High capacitance usage  18 1% f t t l l  18.1% of total clock capacitance [1] [1] k it  See Pham Solid State Circuits’06  Manually design in practice y g p  Hard to satisfy delay/slew constraints  Time to market  S  See Shelar ISPD’09, TCAD’10 Sh l ISPD’09 TCAD’10 [1]: D. Pham, T. Aipperspach, D. Boerstler, M. Bolliger, R. Chaudhry, D. Cox, P. Harvey, P. Harvey, H. Hofstee, C. Johns, et al. Overview of the architecture, circuit design, and physical implementation of a first-generation cell processor. IEEE Journal of Solid-State Circuits, 41(1):179–196, Jan. 2006. I SPD 2011 6

Outline  Introduction  Problem Formulation  Routing Algorithm  Experimental Results  Conclusion I SPD 2011 7

Problem Formulation  Input  A set of reserved tracks  Locations and capacitances of ports P  Different types of wires on each metal layer  Delay limit D . Slew limit S  Output  A clock network (may be non tree structures)  A clock network (may be non-tree structures)  Objective  Connecting every port to the source  Satisfying delay and slew constraints  Minimizing capacitance usage  Minimizing capacitance usage I SPD 2011 8

Post-grid Clock Routing  0 7 7 6 6 Layer Layer 5 5 4 4 3 3 0 0 500 500 1800 1800 1600 1600 1600 1600 1400 1400 1400 1400 1200 1200 1000 1000 1000 1000 800 800 600 600 400 400 200 200 1500 1500 0 0 I SPD 2011 9

Overall Algorithm  Critical ports  Ports with large capacitance or f far away from the source f th  Path expansion algorithm  Elmore-delay driven  Expanding in some selected directions  Post-processing  Wire replacement  Topology refinement  Iterations  The overall algorithm is repeatedly invoked  May fail when number of y iterations > K (user specified) I SPD 2011 11

Delay-driven Path Expansion Algorithm  Basic steps  Simultaneously expand from all ports  Select the path with the minimum Elmore delay to further expand  Connect the ports to the source once the path reaches the source grid  Check delay/slew constraints I SPD 2011 12

A Routing Example  Initially, the heap is empty  First iteration (simultaneously expand from all ports)  Heap={(P 1 ,P 2 );(P 1 ,C 1 );(P 2 ,P 3 );(P 2 ,P 1 );(P 3 ,P 2 );(P 3 ,C 2 )}  Second iteration (P 1 ,P 2 )  Heap={(P 3 ,C 2 );(P 1 ,P 2 ,P 3 );(P 1 ,C 1 );(P 2 ,P 3 );(P 2 ,P 1 );(P 3 ,P 2 )} p {( , );( , , );( , );( , );( , );( , )}  Third iteration (P C )  Third iteration (P 3 ,C 2 )  Heap ={(P 3 ,C 2 ,S 2 );(P 1 ,P 2 ,P 3 );(P 1 ,C 1 );(P 2 ,P 3 );(P 2 ,P 1 );(P 3 ,P 2 )} I SPD 2011 13

A Routing Example  Fourth iteration (identify chain paths)  Heap ={(P 1 ,P 2 ,P 3 );(P 1 ,C 1 );(P 2 ,P 3 );(P 2 ,P 1 )}  Chain path={(P 1 ,P 2 ,P 3 );(P 2 ,P 3 )}  Fifth iteration (P 2 ,P 3 )  Heap={(P 1 ,P 2 ,P 3 );(P 1 ,C 1 )}  Chain path={(P 1 ,P 2 ,P 3 );(P 1 ,P 2 )} p {( , , );( , )}  Sixth iteration (P 1 P 2 )  Sixth iteration (P 1 ,P 2 )  Heap={}, chain path={}  Final result I SPD 2011 14

Post-processing Techniques  Wire replacement  Wire replacement  Port with largest delay: P 5  Two types of wires  Replace edge P 1 C 1  Replace edge P 1 C 1   capacitance/resistance tradeoff it / i t t d ff  Replace edge P 4 C 2  Replace edge P 2 P 3 , P 3 C 1  Procedures  Replace P 5 C 3 , C 3 C 2 , C 2 C 1 , C 1 S 1  Identify port P l with the largest p , , , Identify port P l with the largest Elmore delay  Replace wires in a bottom-up style S 1 S 1 S 1 S 1 S 1  Check delay/slew constrains P 2 P 2 P 2 P 2 P 2 P 3 P 3 P 3 P 3 P 3 C 1 C 1 C 1 C 1 C 1 P 1 P 1 P 1 P 1 P 1 C 3 C 3 C 3 C 3 C 3 C 2 C 2 C 2 C 2 C 2 P 4 P 4 P 4 P 4 P 4 P 5 P 5 P 5 P 5 P 5 I SPD 2011 15

Post-processing Techniques  Topology refinement  Topology refinement  Procedures  Elmore delay:  P 5 >P 4 >P 6 >P 2 >P 1 >P 3 >P 7  Disconnect a port P  Sequentially process all the ports  Expand P towards all directions S 1 S 1 S 1 S 1 P 2 P 3 C 1 C 1 C 1 C 1 P 1 P 1 P 1 P 1 P 2 P 2 P 2 P 3 P 3 P 3  Select paths with smaller capacitance i C 2 C 2 C 2 C 3 C 2 P 4 P 4 P 4 P 4 P 5 P 5 P 5  Check delay/slew constraints P 5 P 6 P 6 P 6 C 4 C 4 C 4 P 6 C 4 C 5 C 5 C 5 P 7 P 7 P 7 C 5 P 7 S 2 S 2 S 2 S 2 I SPD 2011 16

Non-tree Extensions  A small number of ports have  Non-tree extensions exceptionally large capacitances  Connect p to S 1  The delay of its shortest path  The delay of its shortest path  Find a second source S 2 exceeds the delay limit D  Add crosslinks  Procedures  Find a third source S 3  Establish a shortest path for p p p  Add crosslinks  Find a second shortest path  Target delay not met? Add all useful corsslinks  Target delay not met? Do the  Target delay not met? Do the same thing for parent node of p I SPD 2011 17

Experiment Setup  Environment  Implemented in C++  Run on Linux server  Intel Pentium 4 3.2GHz  2GB RAM  Delay setup: 5ps  Slew setup: input: 10ps; output: 15 ps  Benchmarks B h k  3 test cases are provided by industry  11 test cases are from ISPD 2010 Clock Network Synthesis Contest es cases a e o S 0 0 C oc Ne wo Sy es s Co es  Comparisons  Compared with TG, which was proposed by Shelar in ISPD’09, TCAD’10 TCAD’10 I SPD 2011 19

Tree Growing Algorithm  Proposed in R. Shelar ISPD’09,  Tree Growing Algorithm TCAD’10  Expand from the source  D l  Delay/Slew constraints /Sl t i t  Add S 1 C 1 , S 2 C 2  Greedy expansion from the  Add C 2 P 3 source  Add C 1 P 1  Edges with the smallest  Ed ith th ll t  Add P 3 P 2 capacitance will be added into the network S 1 S 1 S 1 S 1 S 1 S 1 S 2 S 2 S 2 S 2 S 2 S 2 C 1 C 1 C 1 C 1 C 1 C 1 C 2 C 2 C 2 C 2 C 2 C 2 P 1 P 1 P 1 P 1 P 1 P 1 P 2 P 2 P 2 P 2 P 2 P 2 P 3 P 3 P 3 P 3 P 3 P 3 I SPD 2011 20

Comparisons: capacitance  Without post-processing: Capacitance (without post-processing techniques) 18 3% i 18.3% improvement t 25000 capacitance (fF) 20000 15000 TG Ours 10000 5000 5000 c 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Test cases  With topology refinement: Capacitance (with topology refinement) 24.6% improvement 25000 ce(fF) 20000 capacitanc 15000 TG Ours 10000 5000 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Test cases I SPD 2011 21

Grid Grid to Grid Grid-to to Ports Clock Routing for to-Ports - PowerPoint PPT Presentation

Grid Grid to Grid Grid-to to Ports Clock Routing for to-Ports Clock Routing for Ports Clock Routing for Ports Clock Routing for High Performance High Performance Microprocessor Designs Microprocessor Designs Haitong Tian # , Wai-Chung

Clock IC Product Update Clock IC Product Update Clock Distribution and Clock Generation Solutions

PORTS PORTS OF OF INDIANA INDIANA INDIANA PORTS PORTS INDIANA OF OF Connecting I ndiana to

Goals The Clock introduce clock signal. logical level clock fall clock rise Chapter 11:

Parallel Ports, Power Supply, and the Clock Oscillator Clock Oscillator Chapter 3 Dr. Iyad

UK Ports EFF & EMFF schemes ESPO smaller ports meeting 5 October 2016 Richard Ballantyne

Scalable Routing Outline Routing Algorithms Scalability 1 Overview Forwarding vs Routing

Ad Hoc Wireless Routing CS 218- Fall 2003 Wireless multihop routing challenges Review of

Routing Algebras What are routing algebras? Created to study properties of routing protocols

Clock Routing Problem Formulation Specialized algorithms are required for clock (and power

Clock Synchronization Synchronization Clock Henrik Lnn Electronics & Software Volvo

Presentation for 1 - Overview of Ports of Normandy What is Ports of Normandy ? Ports of

2016/2017 Capital Improvement Program Georgia Ports Authority Christopher B. Novack, P.E.

LATVIAN PORTS PORTS Three main ports of Latvia: Advantages: Freeport of Riga

SOUTH CAR SOUTH CAROLINA PORTS AUTHORITY OLINA PORTS AUTHORITY HOUSE WAYS AND MEANS ECONOMIC

Fertiliser ports in the Baltic Sea co-funded by EU LIFE Programme Fertiliser ports in the Baltic

Advanced routing topics Tuomas Launiainen Suboptimal routing Routing trees Measurement of

Urban Truck Ports Unlocking the Benefits of High-efficiency Truck Operations SSTI Community of

JERSEY LAUNCH DAY 17 JAN 2019 Lydia Chambers Coordinator Running Order Welcome Working

Ma rke d to pic s a nd c o ntra st in Ava time Sa skia va n Putte n MPI fo r Psyc ho ling

Online sales: lessons learned 29 May 2020 www.globaldistributorscollective.org The GDC is

Distributed Systems Firewalls: Defending the Network Paul Krzyzanowski pxk@cs.rutgers.edu

CS 457 Lecture 20 Transport Layer: UDP and TCP Fall 2011 Topics Principles underlying

SirepRAT Windows IoT Core Abusing a Windows service for RCE About Me 7+ years in InfoSec

Lecture 10: S Parameters Matthew Spencer Harvey Mudd College E157 Radio Frequency Circuit

Grid Grid to Grid Grid-to to Ports Clock Routing for to-Ports - PowerPoint PPT Presentation

Grid Grid to Grid Grid-to to Ports Clock Routing for to-Ports Clock Routing for Ports Clock Routing for Ports Clock Routing for High Performance High Performance Microprocessor Designs Microprocessor Designs Haitong Tian # , Wai-Chung

Clock IC Product Update Clock IC Product Update Clock Distribution and Clock Generation Solutions

PORTS PORTS OF OF INDIANA INDIANA INDIANA PORTS PORTS INDIANA OF OF Connecting I ndiana to

Goals The Clock introduce clock signal. logical level clock fall clock rise Chapter 11:

Parallel Ports, Power Supply, and the Clock Oscillator Clock Oscillator Chapter 3 Dr. Iyad

UK Ports EFF &amp; EMFF schemes ESPO smaller ports meeting 5 October 2016 Richard Ballantyne

Scalable Routing Outline Routing Algorithms Scalability 1 Overview Forwarding vs Routing

Ad Hoc Wireless Routing CS 218- Fall 2003 Wireless multihop routing challenges Review of

Routing Algebras What are routing algebras? Created to study properties of routing protocols

Clock Routing Problem Formulation Specialized algorithms are required for clock (and power

Clock Synchronization Synchronization Clock Henrik Lnn Electronics &amp; Software Volvo

Presentation for 1 - Overview of Ports of Normandy What is Ports of Normandy ? Ports of

2016/2017 Capital Improvement Program Georgia Ports Authority Christopher B. Novack, P.E.

LATVIAN PORTS PORTS Three main ports of Latvia: Advantages: Freeport of Riga

SOUTH CAR SOUTH CAROLINA PORTS AUTHORITY OLINA PORTS AUTHORITY HOUSE WAYS AND MEANS ECONOMIC

Fertiliser ports in the Baltic Sea co-funded by EU LIFE Programme Fertiliser ports in the Baltic

Advanced routing topics Tuomas Launiainen Suboptimal routing Routing trees Measurement of

Urban Truck Ports Unlocking the Benefits of High-efficiency Truck Operations SSTI Community of

JERSEY LAUNCH DAY 17 JAN 2019 Lydia Chambers Coordinator Running Order Welcome Working

Ma rke d to pic s a nd c o ntra st in Ava time Sa skia va n Putte n MPI fo r Psyc ho ling

Online sales: lessons learned 29 May 2020 www.globaldistributorscollective.org The GDC is

Distributed Systems Firewalls: Defending the Network Paul Krzyzanowski pxk@cs.rutgers.edu

CS 457 Lecture 20 Transport Layer: UDP and TCP Fall 2011 Topics Principles underlying

SirepRAT Windows IoT Core Abusing a Windows service for RCE About Me 7+ years in InfoSec

Lecture 10: S Parameters Matthew Spencer Harvey Mudd College E157 Radio Frequency Circuit

UK Ports EFF & EMFF schemes ESPO smaller ports meeting 5 October 2016 Richard Ballantyne

Clock Synchronization Synchronization Clock Henrik Lnn Electronics & Software Volvo