Design Automation Group Power Grid Analysis with Hierarchical Support Graphs Xueqian Zhao Jia Wang Zhuo Feng Shiyan Hu 2011 International Conference on Computer-Aided Design (ICCAD) 1
Power Grid Modeling & Analysis � Multi-layer interconnects are modeled as 3D RC network – Switching gate effects are modeled by time-varying current loadings Vdd Vdd Vdd Vdd � DC analysis solves linear system r r ⋅ = Tens of millions G v b ∈ℜ × n n : G Conductance Matrix of unknowns ! ∈ℜ × n n : C Capacitance Matrix � Transient analysis solves r r ∈ℜ × r 1 n r : ( ) v dv t Node Voltage Vector ⋅ + ⋅ = ( ) ( ) G v t C b t r dt × ∈ℜ 1 n : b Current Loading Vector 2
Prior Work � Prior power grid analysis approaches –Direct methods (LU factorization, Cholesky decomposition) – Cholmod uses 7GB memory and >1,000 s for a 9-million grid –Iterative methods –Preconditioned conjugate gradient ( T. Chen et al, DAC’01 ) –Multigrid methods ( S. Nassif et al, DAC’00 ) –Stochastic method –Random walk ( H. Qian et al, DAC’05 ) V DD V DD V DD V DD Direct Method Random walk Multigrid 3
Support-Graph Preconditioner � Key observation on power grid designs – The degree of freedom of each power grid node is relatively low � Support-graph preconditioners (SGP) – Finding maximum spanning tree in the original grid is efficient – Factorizing the spanning tree matrix is much cheaper than the original matrix – Cost: zero new fill-ins (linear time and space complexity) – Robustness: very effective for ill-conditioned systems – Suitable for both regular/irregular grids (multigrid/Poisson solvers may fail) – Good for conventional/incremental power grid analysis 2 4 2 4 ⎡ ⎤ ' 2 0 0 0 0 0 0 0 d 2 1 2 3 ⎡ ⎤ 1 3 d 2 0 1 0 0 0 0 0 1 ⎢ ⎥ 1 ⎢ ⎥ 2 ' 4 0 0 0 0 0 0 d ⎢ ⎥ 2 d 4 0 3 0 0 0 0 2 ⎢ ⎥ 2 ⎢ ⎥ 1 3 8 1 3 8 0 4 d ' 0 0 8 0 0 0 ⎢ ⎥ 0 4 d 0 0 8 0 0 0 3 ⎢ ⎥ 3 ⎢ ⎥ 0 0 0 ' 6 0 4 0 0 ⎢ d ⎥ 6 5 1 0 0 d 6 0 4 0 0 ⎢ ⎥ 4 6 5 4 ⎢ ⎥ 4 5 6 4 5 6 ⎢ ⎥ 0 0 0 6 ' 5 0 0 0 d 0 3 0 6 5 0 1 0 d ⎢ ⎥ 5 ⎢ ⎥ 5 ⎢ ⎥ 0 0 8 0 5 d ' 0 0 0 ⎢ 0 0 8 0 5 0 0 3 ⎥ d 6 1 3 1 3 6 4 4 ⎢ ⎥ ⎢ ⎥ 0 0 0 4 0 0 d ' 9 0 0 0 0 4 0 0 9 0 d ⎢ ⎥ 7 ⎢ ⎥ 7 9 4 9 4 ⎢ ⎥ 0 0 0 0 0 0 9 ' 4 ⎢ ⎥ d 0 0 0 0 1 0 9 4 d 9 7 8 9 8 7 8 8 ⎢ ⎥ ⎢ ⎥ 0 0 0 0 0 0 0 4 ' ⎣ d ⎦ ⎣ 0 0 0 0 0 3 0 4 ⎦ d 9 9 4
Support-Graph Preconditioner (cont.) � SG preconditioner (P) can well approximate the original matrix (G) – Matches well the dominant eigenvalues – Existing matrix solvers can be used directly – Allows easy multi/many-core parallel computations – Parallel GPU-based MST algorithms can handle 5M grid nodes in 1s � The condition number of P -1 G can be greatly reduced – Effectively reduces the number of Krylov-subspace iterations Matrix 1 st 2 nd 3 rd 4 th 5 th 6 th cond G 26.170 23.182 17.572 11.514 9.373 6.673 135.948 P 25.239 23.540 17.579 10.909 9.865 6.822 16.752 P -1 G 1.431 1.204 1.062 1.000 1.000 1.000 17.442 5
Support-Graph Preconditioner (Cont.) � Condition number of matrix pencil (G,P) λ ( , ) G P = max ( , ) k G P λ ( , ) G P min � The support of (G,P) is defined as: σ = τ ∈ℜ τ − ≥ ∈ℜ T n ( , ) min{ | ( ) 0, all } G P x P G x x – P and G can decomposed into P=P 1 +P 2 +… and G=G 1 +G 2 +… τ − P G – are positive semi-definite matrices i i τ – Eigenvalues of (G,P) are bounded by – Achieve the similar power dissipations in resistive networks T T x Gx Power dissipated by G: Power dissipated by P: x Px 6
Hierarchical Support Graphs � Support graph for each grid partition � Block-level support graph – Use the strongest edge between two blocks to connect them Block-partitioned power grid example Spanning tree Top level inside a block spanning tree 7
Modified Hierarchical Support Graph (Cont.) � Include several strongest links between two blocks – Balance the effectiveness and computational cost of HSG preconditioners – Slightly increase the number of non-zeros in preconditioner matrix factors – P is also known as the ultra-sparsifier of matrix G 1. Extract the block-level SG 2. Each block level edge is replaced by several strong edges from original graph Block-level support graph Inner-block support graph 8
Incremental Analysis � Modified grid needs to be solved in a much faster way – Low-rank matrix factor update through direct method can be very costly – Iterative methods need the preconditioners to be efficiently updated � Support-graph preconditioner updates – Global grid modifications: need to update (factor) for the global SG – Local grid modifications: only need to update (factor) for the block SGs Blk 1 Blk 1 Blk 2 Blk 2 Blk 3 Blk 3 Blk 4 Blk 4 Preconditioner update for Incremental analysis (local grid modifications) 9
Experimental Results: Original SGPCG � DC analysis result (shown in seconds) SG Setup SGPCG Solve CKT #iteration Time Time ibmpg2 0.148 38 0.286 ibmpg3 1.178 68 3.884 Ibmpg4 1.503 28 2.418 Ibmpg5 1.161 212 11.4 ibmpg6 1.944 172 14.2 ibmpg7 1.385 100 9.746 ibmpg8 2.453 104 11.1 � Transient analysis – Requires only one or two iterations for each time step – Can be further improved by scaling capacitance values in the SG 10
Experimental Results: Hierarchical SGPCG CKT HSGPCG Modified HSGPCG ibmpg2 52 (5.0X) 49 (6.9X) ibmpg3 112 (4.5X) 106 (5.1) Ibmpg4 83 (5.5X) 79 (7.2X) Ibmpg5 182 (2.0X) 150 (2.7X) ibmpg6 208 (1.4X) 158 (1.8X) ibmpg7 120 (2.8X) 101 (2.9X) ibmpg8 122 (2.6X) 102 (2.8X) � About the modified HSGPCG solver – Requires extra 1 MB memory for the largest CKT – Results in much better convergence for ill-conditioned matrices 11
Experimental Results: Incremental Analysis � Updated block support-graph proconditioner – Values of 20% resistors are changed by 50% – Using the original SG preconditioner may need much more iterations Updated Block SGPCG Original Block SGPCG CKT #iter. Re-solve (s) Speedup Re-factor (s) #iter. Re-solve (s) Speedup ibmpg2 0.012 9 0.067 20.5X 15 0.087 18.8X 41 2.375 7.6X ibmpg3 0.053 18 0.748 22.5X Ibmpg4 0.047 18 0.699 29.9X 29 1.313 17.1X Ibmpg5 0.043 32 2.079 6.8X 85 3.661 3.9X ibmpg6 0.071 24 2.475 9.1X 50 4.737 4.9X ibmpg7 0.060 19 1.708 20.5X 674 72.380 0.5X ibmpg8 0.058 13 1.380 24.0X 39 3.320 10.4X 12
Future Work: Support-Circuit Preconditioner � Support-circuit preconditioners – Leverage existing parallel Krylov-subspace solvers (GMRES) – Solve extremely large-scale nonlinear circuits – E.g. clock networks, power delivery networks, memory blocks, etc Analog Circuit Blocks LDO LDO LDO LDO LDO LDO LDO LDO Original Circuit with Digital Circuit Blocks Analog and Digital Blocks Support-Circuit Preconditioner Power Delivery Networks with LDOs 13
Recommend
More recommend