Speaker: Jianchao Lu Jianchao Lu, Xiaomi Mao, Baris Taskin VLSI Lab Electrical & Computer Engineering Drexel University 1
Outline Preliminaries Previous Works Methodology Experimental Results Conclusions 2
Clock Mesh Network Consists of top level clock tree, mesh grids and stub wires. 3
Power Dissipation on Clock Network Clock network is a global network of interconnect wires and buffers. Clock signal switching introduces a lot of dynamic power dissipation. Consumes more than 40% of the total power. 2 P C V f clk Switching VDD Frequency factor Capacitance 4
Power Dissipation on Clock Network Clock network is a global network of interconnect wires and buffers. Clock signal switching introduces a lot of dynamic power dissipation. Consumes more than 40% of the total power. 2 P C V f clk Switching VDD Frequency Switching capacitance = α *C_total = factor (C_grid + C_stub + C_tree) Capacitance 5
Outline Preliminaries Previous Works Methodology Experimental Results Conclusions 6
Most Relevant Previous Works [1] A. Rajaram and D. Pan, Meshworks: An efficient framework for planning, synthesis and optimization of clock mesh networks. In Asia and South Pacific Design Automation Conference (ASPDAC), Jan. 2008. [2] M. R. Guthaus, G. Wilke, and R. Reis, Non-uniform clock mesh optimization with linear programming buffer insertion. In Proceedings of the ACM/IEEE Design Automation Conference (DAC), June 2010. [3] Minsik Cho, David Z. Pan and Ruchir Puri, Novel Binary Linear Programming for High Performance Clock Mesh Synthesis, In Proceedings of IEEE/ACM Int'l Conference on Computer-Aided Design (ICCAD), San Jose, CA, November 2010. 7
Meshworks [1] Identifies relationship between grid size and total mesh wire. Optimal grid size based on skew. Mesh reduction. Modified buffer driver insertion. [1] A. Rajaram and D. Pan. Meshworks: An efficient framework for planning, synthesis and optimization of clock mesh networks. In Asia and South Pacific Design Automation Conference (ASPDAC), pages 250 – 257, Jan. 2008. 8
Non-uniform Mesh [2] [2] M. R. Guthaus, G. Wilke, and R. Reis. Non-uniform clock mesh optimization with linear programming buffer insertion. In Proceedings of the ACM/IEEE Design Automation Conference (DAC), pages 74 – 79, June 2010. 9
ILP Based Mesh Synthesis [3] Mesh generation and sink assignment algorithms. [3] Minsik Cho, David Z. Pan and Ruchir Puri, Novel Binary Linear Programming for High Performance Clock Mesh Synthesis, In Proceedings of IEEE/ACM Int'l Conference on Computer-Aided Design (ICCAD), Page 438 — 443, November 2010. 10
Outline Preliminaries Previous Works Methodology Experimental Results Conclusions 11
Proposed Method Optimizing the placement during the clock mesh synthesis. 1 2 4 3 12
Step 1: Creating Feasible Moving Region of Each Register Final Fanout Fanin Registers Initial Register 13
Creating Feasible Moving Regions 14
Creating Feasible Moving Regions 15
Creating Feasible Moving Regions 16
Creating Feasible Moving Regions 17
Creating Feasible Moving Regions 18
Step 2: Mesh Generations Registers can be moved in feasible moving regions without negative timing slack. Choose the minimum amount of mesh tracks that all the registers can be moved on as the mesh network. 19
Step 2: Mesh Generations Registers can be moved in feasible moving regions without negative timing slack. Choose the minimum amount of mesh tracks that all the registers can be moved on as the mesh network. 20
Mesh Generation Problem Problem: Assume each mesh track is a set and each register is an element. Finding the minimum amount of sets that includes all the elements is equivalent to finding the minimum amount of mesh tracks that can connect to the mesh wires. Greedy algorithm: Greedily add the candidate mesh track with the minimum cost. Cost of each grid wire = total distance of the registers from the grid/number of new elements added in the solution set. 21
Step 3: Incremental Register Placement Objective: minimizing Objective total stub wire. Timing constraints Subject to: The timing constraints. The registers should be non-overlapped. Non-overlap Variables: constraints Registers locations. 22
The Incremental Placement Results (s35932 in ISCAS89) Before placement After placement 23
Top Level Clock Tree Generation Insert buffer drivers on the intersection of the mesh grid wires[1][2]. Generate top level clock tree where the sinks are buffer drivers of the mesh grid wires. (Buffered DME) 24
Outline Preliminaries Previous Works Methodology Experimental Results Conclusions 25
Experimental Results Set 2: Compare the proposed Set 1: Compare the proposed method with [2] using the same method with [2] using different grid sizes. grid sizes. Circuit Proposed [2] Circuit Proposed [2] s13207 6*7 8*8 s13207 6*7 6*7 s15850 5*4 8*8 s15850 5*4 5*4 s35932 11*7 12*12 s35932 11*7 11*7 s38417* 10*9 12*12 s38417* 10*9 10*9 s38584 12*7 11*11 s38584 12*7 12*7 [2] M. R. Guthaus, G. Wilke, and R. Reis. Non-uniform clock mesh optimization with linear programming buffer insertion. In Proceedings of the ACM/IEEE Design Automation Conference (DAC), pages 74 – 79, June 2010. 26
Mesh Wire Reduction Set 1 (Different grid size) Set 2 (Same grid size) Average improvement of 51.9%. Average improvement of 50.8% 27
Clock Power Reduction Set 1 (Different grid size) Set 2 (Same grid size) Average improvement of 48.3%. Average improvement of 28.1% 28
Skew Results (45nm PTM) Set 1 (Different grid size) Set 2 (Same grid size) Average skew is in the same Skew is improved by 0.8ps. range. 29
Trade-off The trade-off is the logic wirelength change due to the register placement. 30
Implications of Placement Congestion Before Register Placement After Register Placement 31
Routing Congestion The timing slack is decreased by an average of 22ps, which is very limited compared to the 2ns clock period. 32
Outline Preliminaries Previous Works Methodology Experimental Results Conclusions 33
Conclusions Advantages Significantly reduced power dissipation. Guaranteed timing slack (pre-routing). Disadvantages Power density increase. Timing slack decrease. 34
35
Recommend
More recommend