ercim 2013 fitting techniques for estimating the trace of
play

ERCIM 2013 Fitting techniques for estimating the trace of the - PowerPoint PPT Presentation

ERCIM 2013 Fitting techniques for estimating the trace of the inverse of a matrix Andreas Stathopoulos, Lingfei Wu, Jesse Laeuchli College of William and Mary Vasilis Kalantzis University of Minnesota Stratis Gallopoulos University of Patras,


  1. ERCIM 2013 Fitting techniques for estimating the trace of the inverse of a matrix Andreas Stathopoulos, Lingfei Wu, Jesse Laeuchli College of William and Mary Vasilis Kalantzis University of Minnesota Stratis Gallopoulos University of Patras, Greece Acks: NSF, DOE SciDAC

  2. The problem Given a large, N × N matrix A and a function f find trace of f ( A ) : Tr ( f ( A )) Common functions f ( A ) = A − 1 , log ( A ) , exp ( A ) , R T i A − 1 R j , ... Applications: UQ, Data Mining, Quantum Monte Carlo, Lattice QCD Our focus: f ( A ) = A − 1 but techniques general [ 2 ]

  3. Standard underlying method Monte Carlo (Hutchinson 1989) If x is a vector of random Z 2 variables � 1 with probability 1 / 2 x i = − 1 with probability 1 / 2 then E ( x T A − 1 x ) = Tr ( A − 1 ) Monte Carlo Trace for i=1: n x = randZ2( N ,1) sum = sum + x T A − 1 x trace = sum / n [ 3 ]

  4. Standard underlying method Monte Carlo (Hutchinson 1989) If x is a vector of random Z 2 variables � 1 with probability 1 / 2 x i = − 1 with probability 1 / 2 then E ( x T A − 1 x ) = Tr ( A − 1 ) Monte Carlo Trace 2 problems for i=1: n Large number of samples x = randZ2( N ,1) How to compute x T A − 1 x sum = sum + x T A − 1 x trace = sum / n [ 4 ]

  5. Standard underlying method Monte Carlo (Hutchinson 1989) If x is a vector of random Z 2 variables � 1 with probability 1 / 2 x i = − 1 with probability 1 / 2 then E ( x T A − 1 x ) = Tr ( A − 1 ) Monte Carlo Trace for i=1: n Solve Ay = x vs quadrature x T A − 1 x x = randZ2( N ,1) sum = sum + x T A − 1 x Golub’69, Bai’95, Meurant’06,’09, Strakos’11 O ( 100 − 1000 s ) statistically independent RHS trace = sum / n Recycling (de Sturler), Deflation (Morgan, AS’07) [ 5 ]

  6. Selecting the vectors in x T A − 1 x to min variance or error Random x ∈ Z N best variance for real matrices (Hutchinson 1989) 2 variance depends only on diag ( A − 1 ) x = e i x = F T e i mixing F = DFT, Hadamard (Avron et al. 2010) Deterministic x = He i , i = 1 ,..., 2 k Hadamard in natural order (Bekas et al. 2007) � 1 i ∈ C m x m i = Probing. Assumes multicolored graph (Tang et al. 2011) 0 else x = H ( p m , k i ) Hierarchical Probing for lattices (A.S, J.L. 2013) Maintains benefits of probing but cheap and incremental [ 6 ]

  7. Variance of the estimators Rademacher vectors x i ∈ Z N 2 s ∑ i � = j ( A − 1 Tr = 1 s ∑ s i A − 1 x i Var ( Tr ) = 2 s � ˜ A − 1 � 2 F = 2 i = 1 x T i j ) 2 Diagonal x = e j ( i ) Var ( Tr ) = N 2 i = 1 A − 1 Tr = N s ∑ s s Var ( diag ( A − 1 )) j ( i ) , j ( i ) magnitude −1 A variance Unclear which method is best a-priori [ 7 ]

  8. Why focus on the diagonal method? Trace = integral of a 1-D signal. Can we improve Monte Carlo? Not without external information about the distribution of diagonal elements Our goals: • What if we have an approximation M ≈ diag( A − 1 )? • Is Tr ( M ) ≈ Tr ( A − 1 ) sufficient? • If not, can we use fitting p ( M ) (regression/interpolation/quadrature)? • Can the fitting reduce Var ( p ( M ) − diag ( A − 1 )) ? [ 8 ]

  9. Approximations to diag( A − 1 ) • Inexpensive bounds on diagonal elements (Robinson and Wathen ’92) e.g., for A SPD, 1 / A ii often capture the pattern of diag ( A − 1 ) • Let [ L , U ] = ILU ( A ) (incomplete LU) and M = diag ( U − 1 L − 1 ) Requires only A − 1 i , j entries from sparsity of L , U (Erisman, Tienny, ’75) • Eigen/singular vectors M = diag ( X Λ − 1 Y T ) , for nev smallest eigenvalues Already available from deflating multiple right hand sides! Number of eigenvectors can be increased while solving Ax = e i (eigCG) [ 9 ]

  10. For some problems M captures pattern of diag( A − 1 ) well Laplacian delsq(numgrid(’S’,34)) 1.4 diag(A −1 ) diag(XL −1 X T ) 1.2 diag((LU) −1 ) Deflation: 15 smallest eigenpairs 1 ILU(’crout’,’row’,0.01) 0.8 Traces not close but 0.6 Var ( diag ( X Λ − 1 X T − A − 1 )) = 4e-4 0.4 Var ( diag (( LU ) − 1 − A − 1 )) = 1e-2 0.2 0 0 200 400 600 800 1000 Index of diagonal MC on diag ( A − 1 − M ) can be competitive to Hutchinson’s method [ 10 ]

  11. In some cases approximation is pointless Rajat10 circuit simulation matrix (size 30202) 1 0.5 0 −0.5 diag(A −1 diag(M) −1 0 0.5 1 1.5 2 2.5 3 Index of diagonal element 4 x 10 M from 100 smallest singular triplets [ 11 ]

  12. Capture pattern better by fitting M to D = diag ( A − 1 ) MC resolves shift D = c + M , but not scale D = bM (variance may increase!) Approach 1. Least squares fit with bM + c i A − 1 e i , for i ∈ S a set of k indices 1. Solve D i = e T 2. Find [ b , c ] = argmin {� D ( S ) − ( bM ( S )+ c ) � 2 , b , c ∈ ℜ } Not many points (linear systems) are needed. Typically 10-20. Significant improvement in the estimation of trace Reduces variance for potentially continuing with MC [ 12 ]

  13. Approach 1 example D ≈ bM + c Matrix RDB5000, 50 smallest singular triplets, k=20 points used to fit Accuracy of systems and singular vectors is 1e-6. 0.15 diag(A −1 diag(A −1 M fitted: diag(bM+c) 0.1 0.1 0.05 0.05 0 0 −0.05 −0.05 −0.1 −0.1 −0.15 −0.15 −0.2 −0.2 0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000 Index of diagonal Index of diagonal Tr ( A − 1 ) − 267 . 880 Var ( D ) 6 . 1 e − 3 Tr ( b ∗ M + c ) − 267 . 544 Var ( D − M ) 4 . 3 e − 4 Rel . Err . 1 E − 3 Var ( D − bM − c ) 4 . 3 e − 5 [ 13 ]

  14. Better fitting Linear model preserves shape of M , thus relies too much on the quality of M Interpolating with a higher degree polynomial could be noisy. Approach 2 basic. Piecewise Cubic Hermitian Spline Interpolation (PCHIP) i A − 1 e i , for i ∈ S a set of k indices 1. Solve D i = e T 2. Fit p ( M ( S )) = D ( S ) For PCHIP to effectively capture the pattern (global and local) of D it needs: • smoothness of the approximant • elements of M ( S ) to appear in increasing order • to capture the whole range of values of D • to capture where most of the action in D is happening [ 14 ]

  15. Approach 2. Piecewise Cubic Hermitian Spline Interpolation (PCHIP) 1. [ ˜ M , J ] = sort( M ) to obtain a CDF-like, smooth graph 2. Choose Q a set of k indices: { 1 , 2 } ∈ Q and the k − 2 are chosen such that they minimize the integration error with trapezoidal rule of ˜ M . Do not consider indices that produce non-unique ˜ M i values. 0.15 0.05 diag(A −1 M 0.1 0 0.05 0 −0.05 M −0.05 selected points −0.1 diag(A −1 ) −0.1 −0.15 −0.15 −0.2 0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000 Index of diagonal Diagonal index of sorted M [ 15 ]

  16. Approach 2. Piecewise Cubic Hermitian Spline Interpolation (PCHIP) 1. [ ˜ M , J ] = sort( M ) 2. Choose Q a set of k indices. 3. S = J ( Q ) the corresponding indices in original ordering i A − 1 e i , for i ∈ S 4. Solve D i = e T 0.05 0 −0.05 −0.1 M diag(A −1 ) −0.15 computed A −1 (ii) 0 1000 2000 3000 4000 5000 Diagonal index of sorted M [ 16 ]

  17. Approach 2. Piecewise Cubic Hermitian Spline Interpolation (PCHIP) 1. [ ˜ M , J ] = sort( M ) 2. Choose Q a set of k indices. 3. S = J ( Q ) original ordering i A − 1 e i , for i ∈ S 4. Solve D i = e T 5. PCHIP fit p ( M ( S )) = D ( S ) . Use p ( M ) ≈ D 0.05 0.05 0 0 −0.05 −0.05 −0.1 −0.1 M M diag(A −1 ) diag(A −1 ) −0.15 −0.15 computed A −1 (ii) p(M) fit 0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000 Diagonal index of sorted M Diagonal index of sorted M [ 17 ]

  18. Approach 2. Piecewise Cubic Hermitian Spline Interpolation (PCHIP) 1. [ ˜ M , J ] = sort( M ) 2. Choose Q a set of k indices. If (4) computes also evecs, 3. S = J ( Q ) original ordering update points incrementally i A − 1 e i , for i ∈ S 4. Solve D i = e T 5. PCHIP fit p ( M ( S )) = D ( S ) . Use p ( M ) ≈ D 0.05 0.05 0 0 −0.05 −0.05 −0.1 −0.1 M M diag(A −1 ) diag(A −1 ) −0.15 −0.15 computed A −1 (ii) p(M) fit 0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000 Diagonal index of sorted M Diagonal index of sorted M [ 18 ]

  19. Approach 2. Piecewise Cubic Hermitian Spline Interpolation (PCHIP) 0.045 diag(A −1 ) p(M) bM+c 0.04 0.035 Improvement over bM+c 0.03 0.025 0.02 3500 4000 4500 5000 Diagonal index of sorted M −0.12 0.02 diag(A −1 ) p(M) −0.13 0 bM+c −0.14 −0.02 −0.15 −0.04 −0.16 −0.06 −0.17 −0.08 diag(A −1 ) p(M) −0.18 −0.1 bM+c 0 500 1000 1500 2000 2200 2400 2600 2800 3000 Diagonal index of sorted M Diagonal index of sorted M [ 19 ]

  20. Fitting examples nev=k=100: OLM5000, SiNa, KUU Comparing diag(A −1 ), diag(M) and diag(M_fit) Comparing diag(A −1 ), diag(M) and diag(M_fit) 0 diag(A −1 ) −0.194 M Fit p(M) −0.196 −0.05 −0.198 Diagonal element Diagonal element −0.2 diag(A −1 ) −0.1 OLM5000 M −0.202 Fit p(M) −0.204 −0.15 −0.206 −0.208 −0.2 −0.21 0 1000 2000 3000 4000 5000 0 500 1000 1500 2000 2500 Index of sorted M Index of sorted M Comparing diag(A −1 ), diag(M) and diag(M_fit) Comparing diag(A −1 ), diag(M) and diag(M_fit) 1.4 0.2 1.2 0.15 1 0.1 0.8 KUU SiNa 0.05 0.6 0 0.4 −0.05 diag(A −1 ) diag(A −1 ) 0.2 M M −0.1 Fit p(M) Fit p(M) 0 0 1000 2000 3000 4000 5000 6000 7000 0 1000 2000 3000 4000 5000 Index of sorted M Index of sorted M [ 20 ]

Recommend


More recommend