DESIGN AND TUNING OF A TREE-MESH CLOCK DISTRIBUTION Nikhil - PowerPoint PPT Presentation

DESIGN AND TUNING OF A TREE-MESH CLOCK DISTRIBUTION Nikhil Jayakumar, Dave Murata, Valery Kugel { nikhilj, dmurata , valery } @juniper.net Juniper Networks

OVERVIEW • Comparison of Clock trees vs Clock Grids/Mesh • Juniper’s Clock distribution design overview • Juniper’s 2 step tuning flow for clock meshes • Coarse Tuning • Fine Tuning • Conclusion

CLOCK TREES VS CLOCK GRIDS STA  There are 2 two kinds of clock skews  Structural (layout) skew  Capacitive load mismatch Handled by balanced clock trees (Eg: Htrees)  Wire length mismatch • Zero / low skew only in the absence of PVT variations  Skew due to PVT Two types of approaches to handle this: • Dynamic: variations • Dynamic clock deskewing schemes • Static: • Cross-link addition • Clock mesh / grid / hybrid tree-mesh Necessitates SPICE based analysis • Regular STA won’t work due to re -convergences. (more on this later….)

VERTICAL CLOCK SPINE + HORIZONTAL CLOCK RIBS Constructed to  be balanced  have low latency (and hence low jitter) Wire width, spacing, buffer drive strength, wire length between buffers chosen after careful simulation. Factors considered:  Jitter (chose wire code for minimum jitter per unit length)  Slew constraints  Dynamic IR drop & EM limits  Routability &area constraints  Overshoot & undershoot due to inductance Cancel out PVT variations through insertion of cross-links (shorting wires) at regular intervals.  Cross-links were inserted only if skew reduction outweighed jitter increase.

WHY CROSS-LINKS COMPLICATE TIMING? 0p STA cannot handle 50p re-convergence in non-linear 100p circuits.  SPICE confirms the averaging 150p effect of the short, but STA 200p 395p ? cannot see this. 250p  Where is the point of divergence? 350p 300p ? Need a SPICE simulation to 405p estimate delays.

JUNIPER GLOBAL CLOCK DISTRIBUTION Hybrid tree-mesh Vertical clock spine Horizontal clock ribs  Balanced tree driving a mesh Core clock region  Cross-links added at regular intervals in the tree also to reduce skew due to PVT Construction:  PLL drives Vertical Spine  Vertical Spine drives 6 Horizontal Ribs 3.1mm  Horizontal Ribs drive clock mesh Technology Details:  Frequency: 700Mhz to 800Mhz  TSMC 40nm (45GS_1P10M_6X1Y2Z + Al RDL)  Top 2 (thick) metal layers (Mz) used to distribute the core clock 17mm

WHY REDUCE SKEW IN A MESH? Q : Clock meshes reduce skew - so then why do we have to tune it?  Clock meshes have an effect of averaging the delay – but at the cost of short circcuit current  Large skew can result in a very large short-circuit current for drivers whose outputs are shorted Should not rely on the mesh to reduce structural skew. The mesh is used to only reduce PVT skew.

JUNIPER’S 2 STEP TUNING FLOW 1. Coarse-tuning through balancing  Tuning the vertical spine and horizontal ribs through RC balancing  Tuning the mesh through selective removal of horizontal cross-link wires in the mesh  Based on effective wire length (capacitance) driven by each buffer 2. Fine-tuning through driver sizing  Automatic driver tuning flow that sizes drivers in the vertical spine and horizontal ribs  Drivers are sized to achieve uniform output delay and slew  Flow can simultaneously size several thousands of buffers  Manual tuning is impossible on such a scale

COARSE TUNING FLOW OF THE MESH DB with full clock mesh Remove all horizontal Mz wires of the clock mesh except the ones closest to the horizontal clock ribs Find effective length (and thus capacitance) of vertical Mz wires of clock mesh driven by each buffer Add back horizontal Mz cross-links such that total effective capacitance is equal across all output buffers Extract Clock mesh (STAR-RC) Simulate in SPICE and verify skew

FINE TUNING FLOW Buffers are sized based on Extracted netlist output slew  If slew is larger than target slew, the buffer is up-sized Simulate in SPICE and gather slew and delay data proportionally to achieve target slew  If slew is smaller than target Re-size buffers based on slew at output slew, the buffer is down-sized of buffers (aim is to get slew at all buffers proportionally to achieve to be uniform) target slew Simulate modified netlist (with re-sized The fine tuning flow is able to buffers) and gather slew and delay data converge to a low-skew solution within 2 to 3 iterations YES Is Buffers can be re-sized without [skew(previous_run) – re-extracting since the buffers skew (current_run)] > are designed to be footprint 1ps ? compatible NO  Saves significant runtime since extraction alone can Modified netlist & DB take a day or more

RESULTS The tuning flow allowed us to reduce the structural skew of the mesh  Skew was reduced to < 30ps across the whole core region and across multiple process corners (from > 100ps before tuning) The removal of the majority of the cross-links also helped save power  Power consumed by the distribution (including buffers in the vertical spine + horizontal ribs) was = 1.4W for a 16mm X17mm clock mesh area at 0.9V, 800Mhz  Removal of the horizontal cross-links helped reduce mesh capacitance and thus clock power by 30% Skew = 30ps Core clock area = 16mm X 17mm * Example of skew plot over an 16mm X 17mm core region

CONCLUSION We have presented a 2 step tuning flow that can de-skew and tune a clock mesh containing several thousand buffers  The fine-tuning flow enables 2 to 3 iterations to be completed within 24 hours.  Structural skew of more than 100ps was reduced to less than 25ps Removal of horizontal Mz cross-links in the clock mesh helped reduce clock power  Clock distribution + mesh consumed a total of 1.4W in a 100W chip  The removal of most of the horizontal cross-links reduced mesh capacitance and power by ~30% This tuning flow was used in multiple chips across two technology generations

Thank You

DESIGN AND TUNING OF A TREE-MESH CLOCK DISTRIBUTION Nikhil - PowerPoint PPT Presentation

DESIGN AND TUNING OF A TREE-MESH CLOCK DISTRIBUTION Nikhil Jayakumar, Dave Murata, Valery Kugel { nikhilj, dmurata , valery } @juniper.net Juniper Networks OVERVIEW Comparison of Clock trees vs Clock Grids/Mesh Junipers Clock

Clock IC Product Update Clock IC Product Update Clock Distribution and Clock Generation Solutions

Goals The Clock introduce clock signal. logical level clock fall clock rise Chapter 11:

Thomas Hhn 4. Juni 2009 TU-Berlin, Berlin Why to How to Worksheets mesh ? mesh ? Outline

Discrete Buffer and Wire Sizing for Discrete Buffer and Wire Sizing for Link-Based Non-Tree Clock

Mesh Basics Mesh Basics 1 Spring 2010 Definitions: Definitions: 1/2 Definitions:

Grid Grid to Grid Grid-to to Ports Clock Routing for to-Ports Clock Routing for Ports Clock

Mesh Networks | Hacking The T3lc0 Model http://arig.org.il What's a Mesh Anyway ? Mesh =

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree

ISPD 2006 Arjun Rajagopal Arjun Rajagopal Dallas DSP Design Dallas DSP Design Texas

MIMD Multicomputer Mesh, ring, linear array, 2D-torus, 3D-mesh 3D-torus, tree fat tree,

Clock Synchronization Synchronization Clock Henrik Lnn Electronics & Software Volvo

Smoothing Gianpaolo Palma Triangle Mesh List of vertices + List of triangle as triple of vertex

A Service Mesh Is Easy To Swallow In Small Pieces Andrew Jenkins Eng Lead, Aspen Mesh

W ir eless Mesh Netw or k W ir eless Mesh Netw or k Technical Overview Technical Overview Danny

What Makes for a Good Mesh? CS101 Meshing Winter 2007 1 Mesh Quality What makes a mesh

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

SSR1 Fabrication at IUAC Present Status Abhishek Rai, P.N.Prakash, J.Sacharias, K.K.Mistri

Overview of ReA Facility D.J. Morrissey Associate Director of Operations Outline Coupled

Point of Care Ultrasound UCSF Continuing Medical Education October 21-22, 2018 Disclosure I

Kelvin-Helmholtz instability above Richardson number 1 / 4 J P Parker, C P Caulfield, R R Kerswell

Toward Programmable Interdomain Routing Qiao Xiang 1 , J. Jensen Zhang 1, 2 , Franck Le 3 , Y.

ndnSI ndnSIM: : Current Status & La Latest Advancements Spyros Mastorakis Internet

A Hierarchical Model for BGP Routing Policies Laurent Vanbever , Bruno Quoitin and Olivier

Ribs Wing Structures Aerospace Structures Function of Ribs Maintain aerodynamic profile of

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

DESIGN AND TUNING OF A TREE-MESH CLOCK DISTRIBUTION Nikhil - PowerPoint PPT Presentation

DESIGN AND TUNING OF A TREE-MESH CLOCK DISTRIBUTION Nikhil Jayakumar, Dave Murata, Valery Kugel { nikhilj, dmurata , valery } @juniper.net Juniper Networks OVERVIEW Comparison of Clock trees vs Clock Grids/Mesh Junipers Clock

Clock IC Product Update Clock IC Product Update Clock Distribution and Clock Generation Solutions

Goals The Clock introduce clock signal. logical level clock fall clock rise Chapter 11:

Thomas Hhn 4. Juni 2009 TU-Berlin, Berlin Why to How to Worksheets mesh ? mesh ? Outline

Discrete Buffer and Wire Sizing for Discrete Buffer and Wire Sizing for Link-Based Non-Tree Clock

Mesh Basics Mesh Basics 1 Spring 2010 Definitions: Definitions: 1/2 Definitions:

Grid Grid to Grid Grid-to to Ports Clock Routing for to-Ports Clock Routing for Ports Clock

Mesh Networks | Hacking The T3lc0 Model http://arig.org.il What's a Mesh Anyway ? Mesh =

Are Hybrid Physical Designs Important? 1 B+ tree 2 C O L B+ tree 3 ? C O L C O L B+ tree

ISPD 2006 Arjun Rajagopal Arjun Rajagopal Dallas DSP Design Dallas DSP Design Texas

MIMD Multicomputer Mesh, ring, linear array, 2D-torus, 3D-mesh 3D-torus, tree fat tree,

Clock Synchronization Synchronization Clock Henrik Lnn Electronics &amp; Software Volvo

Smoothing Gianpaolo Palma Triangle Mesh List of vertices + List of triangle as triple of vertex

A Service Mesh Is Easy To Swallow In Small Pieces Andrew Jenkins Eng Lead, Aspen Mesh

W ir eless Mesh Netw or k W ir eless Mesh Netw or k Technical Overview Technical Overview Danny

What Makes for a Good Mesh? CS101 Meshing Winter 2007 1 Mesh Quality What makes a mesh

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

SSR1 Fabrication at IUAC Present Status Abhishek Rai, P.N.Prakash, J.Sacharias, K.K.Mistri

Overview of ReA Facility D.J. Morrissey Associate Director of Operations Outline Coupled

Point of Care Ultrasound UCSF Continuing Medical Education October 21-22, 2018 Disclosure I

Kelvin-Helmholtz instability above Richardson number 1 / 4 J P Parker, C P Caulfield, R R Kerswell

Toward Programmable Interdomain Routing Qiao Xiang 1 , J. Jensen Zhang 1, 2 , Franck Le 3 , Y.

ndnSI ndnSIM: : Current Status &amp; La Latest Advancements Spyros Mastorakis Internet

A Hierarchical Model for BGP Routing Policies Laurent Vanbever , Bruno Quoitin and Olivier

Ribs Wing Structures Aerospace Structures Function of Ribs Maintain aerodynamic profile of

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

Clock Synchronization Synchronization Clock Henrik Lnn Electronics & Software Volvo

ndnSI ndnSIM: : Current Status & La Latest Advancements Spyros Mastorakis Internet