heuristics for nested dissection to reduce fill in in
play

Heuristics for Nested Dissection to Reduce Fill-in in Sparse Matrix - PowerPoint PPT Presentation

Heuristics for Nested Dissection to Reduce Fill-in in Sparse Matrix Factorizations Miguel Vargas-Flix, Salvador Botello-Rionda miguelvargas@cimat.mx, botello@cimat.mx Morelia, November 6, 2013 1/28 Sparse matrices In most problems of finite


  1. Heuristics for Nested Dissection to Reduce Fill-in in Sparse Matrix Factorizations Miguel Vargas-Félix, Salvador Botello-Rionda miguelvargas@cimat.mx, botello@cimat.mx Morelia, November 6, 2013 1/28

  2. Sparse matrices In most problems of finite element method, finite volume or isogeometric analysis we have to solve a linear system of equations A x = b . When the stiffness matrix is assembled, the relation between adjacent nodes is captured as entries in a matrix. Because a node has adjacency with only a few others, the resulting matrix has a very sparse structure. ( ⋱ ) ∘ ∘ ∘ ∘ ∘ ∘ ∘ ⋯ ∘ ∘ ∘ ∘ ∘ ∘ ∘ ⋯ ∘ a i i ∘ a i j ∘ 0 ∘ ⋯ ∘ ∘ ∘ ∘ ∘ ∘ ∘ ⋯ i j ∘ a j i ∘ a j j ∘ 0 ∘ ⋯ ∘ ∘ ∘ ∘ ∘ ∘ ∘ ⋯ ∘ 0 ∘ 0 ∘ a k k ∘ ⋯ ∘ ∘ ∘ ∘ ∘ ∘ ∘ ⋯ k ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ http://www.cimat.mx/~miguelvargas 2/28

  3. Lets define the notation η( A ) , it indicates the number of non-zero entries of A . 556 × 556 has 309,136 entries, with η( A )= 1810, this means that only the 0.58% of For example, A ∈ℝ the entries are non zero. Black dots indicates a non zero entry in the matrix In finite element problems all matrices have symmetric structure, and depending on the problem symmetric values or not. http://www.cimat.mx/~miguelvargas 3/28

  4. Cholesky factorization for sparse matrices T is O ( n 3 ) . For full matrices the computational complexity of Cholesky factorization A = LL To calculate entries of L L j j ( A i j − ∑ L i k L j k ) , for i > j j − 1 L i j = 1 k = 1 L j j = √ A j j − ∑ j − 1 2 . L j k k = 1 We use four strategies to reduce time and memory usage when performing this factorization on sparse matrices: 1. Reordering of rows and columns of the matrix to reduce fill-in in L . This is equivalent to use a permutation matrix to reorder the system ( P AP T ) ( Px ) = ( P b ) . 2. Use symbolic Cholesky factorization to obtain an exact L factor (non zero entries in L ). 3. Organize operations to improve cache usage. 4. Parallelize the factorization. http://www.cimat.mx/~miguelvargas 4/28

  5. Matrix reordering We want to reorder rows and columns of A , in a way that the number of non-zero entries of L are reduced. η ( L ) indicates the number of non-zero entries of L . A = L = 556 × 556 , with η ( A ) = 1810 . To the right the lower triangular The stiffness matrix to the left A ∈ℝ matrix L , with η ( L ) = 8729 . There are several heuristics like the minimum degree algorithm [Geor81] or a nested dissection method [Kary99]. http://www.cimat.mx/~miguelvargas 5/28

  6. By reordering we have a matrix A ' with η ( A ' ) = 1810 and its factorization L ' with η ( L ' ) = 3215. Both factorizations solve the same system of equations. A ' = L ' = We reduce the factorization fill-in by η ( L ' ) = 3215 η ( L ) = 8729 = 0.368 . To determine a “good” reordering for a matrix A that minimize the fill-in of L is an NP complete problem [Yann81]. http://www.cimat.mx/~miguelvargas 6/28

  7. Symbolic Cholesky factorization The algorithm to determine the L i j entries that area non-zero is called symbolic Cholesky factorization [Gall90]. Let be, for all columns j = 1 … n , a j ≝ { k > j ∣ A k j ≠ 0 } , l j ≝ { k > j ∣ L k j ≠ 0 } . The sets r j will register the columns of L which structure will affect the column j of L . for i ← 1,2, … ,n r i ← ∅ A = ( a 66 ) L = ( l 66 ) · a 11 a 12 a 16 l 11 for i ← 1,2, … ,n a 21 a 22 a 23 a 24 l 21 l 22 l 32 l 33 l i ← a i a 32 a 33 a 35 · l 42 l 43 l 44 a 42 a 44 for each j ∈ r i · l 53 l 54 l 55 a 53 a 55 a 56 l i ← l i ∪ l j ∖{ i } · · l 61 l 62 l 63 l 64 l 65 a 61 a 65 if l i ≠∅ · a 2 = { 3,4 } l 2 = { 3,4,6 } p ← min { j ∈ l i } · · r p ← r p ∪ { i } · · This algorithm is very efficient, its complexity in time and space has an order of O (η( L )) . http://www.cimat.mx/~miguelvargas 7/28

  8. How efficient is it? The next table shows results solving a 2D Poisson equation problem, comparing Cholesky and conjugate gradient with Jacobi preconditioning. Number of equations nnz(A) nnz(L) Cholesky time [s] CGJ time [s] 1,006 6,140 14,722 0.086 0.081 3,110 20,112 62,363 0.137 0.103 10,014 67,052 265,566 0.309 0.184 31,615 215,807 1’059,714 1.008 0.454 102,233 705,689 4’162,084 3.810 2.891 312,248 2’168,286 14’697,188 15.819 19.165 909,540 6’336,942 48’748,327 69.353 89.660 3’105,275 21’681,667 188’982,798 409.365 543.110 10’757,887 75’202,303 743’643,820 2,780.734 3,386.609 32,703.9 10,000 3,386.6 10,000 2,780.7 7,134.4 4,656.1 1,000 543.1 1,747.3 409.4 1,343.5 1,000 512.5 100 393.2 89.7 69.4 Memory [Mega bytes] 145.3 134.9 100 19.2 15.8 Time [s] 10 44.1 38.2 3.8 2.9 13.6 10 10.2 1 1.0 4.3 0.5 2.6 Cholesky 0.3 1.3 0.2 1 CGJ 0.1 Cholesky 0.7 0 0.1 0.1 0.1 0.4 CGJ 0 0 1,000 10,000 100,000 1,000,000 10,000,000 1,000 10,000 100,000 1,000,000 10,000,000 Equations Equations http://www.cimat.mx/~miguelvargas 8/28

  9. Graph reordering Let G = ( V , E ) be a graph with n vertex V with m edges E , 2 1 4 7 6 3 5 A ordering (or tagging) α of G is simply a mapping from the set { 1,2, … ,n } in V , where n is the α , E α = ( V α ) . number of vertex of G . The graph tagged by α will be written G 4 5 6 3 1 7 2 http://www.cimat.mx/~miguelvargas 9/28

  10. A , E A = ( V A ) , in it Let A be a sparse matrix, symmetric, of size n × n , the ordered graph of A , G { v i ,v j } ∈ E A if and only if A i j = A j i ≠ 0 , i ≠ j . Here v i is a vertex of V A with tag A . A = ( ● ) ● ● ● ● · ● · 2 ● ● · ● · · · ● · ● · ● · · 1 4 A , E A = ( V A ) G ● ● · ● · · ● · · ● · ● · ● 7 6 3 ● · · · · ● · 5 · · · ● ● · A , E A = ( V A ) is equivalent to a apply a permutation matrix. An ordering α applied to G T = ( ● ) ● · · · ● · · 4 · ● ● · · · ● · ● ● · · ● · 5 6 α , E α = ( V α ) G PA P · · · ● ● ● · 3 ● · · ● ● ● ● 1 7 · · ● ● ● ● · 2 · ● · · ● · For the fill-in of the Cholesky factorization, to find a “good” permutation P for A that reduces the number of non-zero entries on L means to find a “good”ordering of its graph [Geor81]. http://www.cimat.mx/~miguelvargas 10/28

  11. Nested dissection algorithm Separator Lets consider a separator set S ⊂ V in G = ( V , E ) , when removed from G the graph is disconnected in two non-empty sets of vertex R ⊂ V and S ⊂ V , also R ∩ S =∅ , S ∩ T =∅ , R ∩ T =∅ y R ∪ S ∪ T = V In the next example V = { 1,2, … , 13 } , let S = { 1,10,13 } be a separator of G = ( V , E ) , when removed from the graph we obtain R = { 4,3,7,8,11 } and T = { 2,5,6,9,12 } 1 4 5 4 5 3 2 3 2 7 6 7 6 10 8 8 9 11 9 11 12 12 13 http://www.cimat.mx/~miguelvargas 11/28

  12. This procedure applied in a recursive way is known as the generalized nested dissection algorithm [Lipt79]. The idea is to split the graph trying to make R and T of the same size with a small separator. This is an divide-and-conquer recursive method. Dissection G ( V , E ) If ∣ G ∣ <β The vertexes V are ordered with any order · (the minimum degree algorithm can be used) else Search for a separator set S ⊂ V such that after removing it from G · we get two sets of disconnected vertexes R and T , with ∣ R ∣ ≈ ∣ T ∣ . Remove all edges from E that have connections with entries in S . · Dissection G ( R , E ) · Dissection G ( T , E ) · Order the vertexes, putting the ones in S at the end of the order. · This algorithm is recursive and the resulting order of vertexes of G produces an efficient Cholesky factorization. An improved version of this algorithm is presented in [Kary99]. http://www.cimat.mx/~miguelvargas 12/28

  13. This is an exaple of the nested dissection algorithm: Graph Dependency tree 1 4 5 3 2 7 6 1, 2, 3, 4, 5, 6, 7, 10 8, 9, 10, 11, 12, 8 16 13, 14, 15, 16, 11 12 9 17, 18, 19, 20 13 19 18 15 14 17 20 1 4 5 3 2 5, 2, 10, 7 6 13, 17 10 8 16 11 12 9 1, 4, 3, 7, 6, 12, 18, 20, 13 19 8, 11, 14 9, 15, 16, 19 18 15 14 17 20 http://www.cimat.mx/~miguelvargas 13/28

  14. Graph Dependency tree 1 4 3 5, 2, 10, 7 6 13, 17 8 16 11 12 9 1, 4, 3, 7, 6, 12, 18, 20, 19 8, 11, 14 9, 15, 16, 19 18 15 14 20 1 5, 2, 10, 4 13, 17 3 7 6 8 16 7, 8 12, 15 11 12 9 19 1, 11, 18, 6, 9, 18 15 14 4, 14 20 16, 19 3 20 http://www.cimat.mx/~miguelvargas 14/28

  15. Graph Dependency tree 1 5, 2, 10, 4 13, 17 3 6 16 7, 8 12, 15 11 9 19 1, 11, 18, 6, 9, 18 14 4, 14 20 16, 19 3 20 5, 2, 10, 1 13, 17 4 3 6 7, 8 12, 15 16 11 9 11, 18, 9, 4 19 14 20 16 18 14 1 3 20 6 19 http://www.cimat.mx/~miguelvargas 15/28

Recommend


More recommend