ERCIM 2013 Fitting techniques for estimating the trace of the inverse of a matrix Andreas Stathopoulos, Lingfei Wu, Jesse Laeuchli College of William and Mary Vasilis Kalantzis University of Minnesota Stratis Gallopoulos University of Patras, Greece Acks: NSF, DOE SciDAC
The problem Given a large, N × N matrix A and a function f find trace of f ( A ) : Tr ( f ( A )) Common functions f ( A ) = A − 1 , log ( A ) , exp ( A ) , R T i A − 1 R j , ... Applications: UQ, Data Mining, Quantum Monte Carlo, Lattice QCD Our focus: f ( A ) = A − 1 but techniques general [ 2 ]
Standard underlying method Monte Carlo (Hutchinson 1989) If x is a vector of random Z 2 variables � 1 with probability 1 / 2 x i = − 1 with probability 1 / 2 then E ( x T A − 1 x ) = Tr ( A − 1 ) Monte Carlo Trace for i=1: n x = randZ2( N ,1) sum = sum + x T A − 1 x trace = sum / n [ 3 ]
Standard underlying method Monte Carlo (Hutchinson 1989) If x is a vector of random Z 2 variables � 1 with probability 1 / 2 x i = − 1 with probability 1 / 2 then E ( x T A − 1 x ) = Tr ( A − 1 ) Monte Carlo Trace 2 problems for i=1: n Large number of samples x = randZ2( N ,1) How to compute x T A − 1 x sum = sum + x T A − 1 x trace = sum / n [ 4 ]
Standard underlying method Monte Carlo (Hutchinson 1989) If x is a vector of random Z 2 variables � 1 with probability 1 / 2 x i = − 1 with probability 1 / 2 then E ( x T A − 1 x ) = Tr ( A − 1 ) Monte Carlo Trace for i=1: n Solve Ay = x vs quadrature x T A − 1 x x = randZ2( N ,1) sum = sum + x T A − 1 x Golub’69, Bai’95, Meurant’06,’09, Strakos’11 O ( 100 − 1000 s ) statistically independent RHS trace = sum / n Recycling (de Sturler), Deflation (Morgan, AS’07) [ 5 ]
Selecting the vectors in x T A − 1 x to min variance or error Random x ∈ Z N best variance for real matrices (Hutchinson 1989) 2 variance depends only on diag ( A − 1 ) x = e i x = F T e i mixing F = DFT, Hadamard (Avron et al. 2010) Deterministic x = He i , i = 1 ,..., 2 k Hadamard in natural order (Bekas et al. 2007) � 1 i ∈ C m x m i = Probing. Assumes multicolored graph (Tang et al. 2011) 0 else x = H ( p m , k i ) Hierarchical Probing for lattices (A.S, J.L. 2013) Maintains benefits of probing but cheap and incremental [ 6 ]
Variance of the estimators Rademacher vectors x i ∈ Z N 2 s ∑ i � = j ( A − 1 Tr = 1 s ∑ s i A − 1 x i Var ( Tr ) = 2 s � ˜ A − 1 � 2 F = 2 i = 1 x T i j ) 2 Diagonal x = e j ( i ) Var ( Tr ) = N 2 i = 1 A − 1 Tr = N s ∑ s s Var ( diag ( A − 1 )) j ( i ) , j ( i ) magnitude −1 A variance Unclear which method is best a-priori [ 7 ]
Why focus on the diagonal method? Trace = integral of a 1-D signal. Can we improve Monte Carlo? Not without external information about the distribution of diagonal elements Our goals: • What if we have an approximation M ≈ diag( A − 1 )? • Is Tr ( M ) ≈ Tr ( A − 1 ) sufficient? • If not, can we use fitting p ( M ) (regression/interpolation/quadrature)? • Can the fitting reduce Var ( p ( M ) − diag ( A − 1 )) ? [ 8 ]
Approximations to diag( A − 1 ) • Inexpensive bounds on diagonal elements (Robinson and Wathen ’92) e.g., for A SPD, 1 / A ii often capture the pattern of diag ( A − 1 ) • Let [ L , U ] = ILU ( A ) (incomplete LU) and M = diag ( U − 1 L − 1 ) Requires only A − 1 i , j entries from sparsity of L , U (Erisman, Tienny, ’75) • Eigen/singular vectors M = diag ( X Λ − 1 Y T ) , for nev smallest eigenvalues Already available from deflating multiple right hand sides! Number of eigenvectors can be increased while solving Ax = e i (eigCG) [ 9 ]
For some problems M captures pattern of diag( A − 1 ) well Laplacian delsq(numgrid(’S’,34)) 1.4 diag(A −1 ) diag(XL −1 X T ) 1.2 diag((LU) −1 ) Deflation: 15 smallest eigenpairs 1 ILU(’crout’,’row’,0.01) 0.8 Traces not close but 0.6 Var ( diag ( X Λ − 1 X T − A − 1 )) = 4e-4 0.4 Var ( diag (( LU ) − 1 − A − 1 )) = 1e-2 0.2 0 0 200 400 600 800 1000 Index of diagonal MC on diag ( A − 1 − M ) can be competitive to Hutchinson’s method [ 10 ]
In some cases approximation is pointless Rajat10 circuit simulation matrix (size 30202) 1 0.5 0 −0.5 diag(A −1 diag(M) −1 0 0.5 1 1.5 2 2.5 3 Index of diagonal element 4 x 10 M from 100 smallest singular triplets [ 11 ]
Capture pattern better by fitting M to D = diag ( A − 1 ) MC resolves shift D = c + M , but not scale D = bM (variance may increase!) Approach 1. Least squares fit with bM + c i A − 1 e i , for i ∈ S a set of k indices 1. Solve D i = e T 2. Find [ b , c ] = argmin {� D ( S ) − ( bM ( S )+ c ) � 2 , b , c ∈ ℜ } Not many points (linear systems) are needed. Typically 10-20. Significant improvement in the estimation of trace Reduces variance for potentially continuing with MC [ 12 ]
Approach 1 example D ≈ bM + c Matrix RDB5000, 50 smallest singular triplets, k=20 points used to fit Accuracy of systems and singular vectors is 1e-6. 0.15 diag(A −1 diag(A −1 M fitted: diag(bM+c) 0.1 0.1 0.05 0.05 0 0 −0.05 −0.05 −0.1 −0.1 −0.15 −0.15 −0.2 −0.2 0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000 Index of diagonal Index of diagonal Tr ( A − 1 ) − 267 . 880 Var ( D ) 6 . 1 e − 3 Tr ( b ∗ M + c ) − 267 . 544 Var ( D − M ) 4 . 3 e − 4 Rel . Err . 1 E − 3 Var ( D − bM − c ) 4 . 3 e − 5 [ 13 ]
Better fitting Linear model preserves shape of M , thus relies too much on the quality of M Interpolating with a higher degree polynomial could be noisy. Approach 2 basic. Piecewise Cubic Hermitian Spline Interpolation (PCHIP) i A − 1 e i , for i ∈ S a set of k indices 1. Solve D i = e T 2. Fit p ( M ( S )) = D ( S ) For PCHIP to effectively capture the pattern (global and local) of D it needs: • smoothness of the approximant • elements of M ( S ) to appear in increasing order • to capture the whole range of values of D • to capture where most of the action in D is happening [ 14 ]
Approach 2. Piecewise Cubic Hermitian Spline Interpolation (PCHIP) 1. [ ˜ M , J ] = sort( M ) to obtain a CDF-like, smooth graph 2. Choose Q a set of k indices: { 1 , 2 } ∈ Q and the k − 2 are chosen such that they minimize the integration error with trapezoidal rule of ˜ M . Do not consider indices that produce non-unique ˜ M i values. 0.15 0.05 diag(A −1 M 0.1 0 0.05 0 −0.05 M −0.05 selected points −0.1 diag(A −1 ) −0.1 −0.15 −0.15 −0.2 0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000 Index of diagonal Diagonal index of sorted M [ 15 ]
Approach 2. Piecewise Cubic Hermitian Spline Interpolation (PCHIP) 1. [ ˜ M , J ] = sort( M ) 2. Choose Q a set of k indices. 3. S = J ( Q ) the corresponding indices in original ordering i A − 1 e i , for i ∈ S 4. Solve D i = e T 0.05 0 −0.05 −0.1 M diag(A −1 ) −0.15 computed A −1 (ii) 0 1000 2000 3000 4000 5000 Diagonal index of sorted M [ 16 ]
Approach 2. Piecewise Cubic Hermitian Spline Interpolation (PCHIP) 1. [ ˜ M , J ] = sort( M ) 2. Choose Q a set of k indices. 3. S = J ( Q ) original ordering i A − 1 e i , for i ∈ S 4. Solve D i = e T 5. PCHIP fit p ( M ( S )) = D ( S ) . Use p ( M ) ≈ D 0.05 0.05 0 0 −0.05 −0.05 −0.1 −0.1 M M diag(A −1 ) diag(A −1 ) −0.15 −0.15 computed A −1 (ii) p(M) fit 0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000 Diagonal index of sorted M Diagonal index of sorted M [ 17 ]
Approach 2. Piecewise Cubic Hermitian Spline Interpolation (PCHIP) 1. [ ˜ M , J ] = sort( M ) 2. Choose Q a set of k indices. If (4) computes also evecs, 3. S = J ( Q ) original ordering update points incrementally i A − 1 e i , for i ∈ S 4. Solve D i = e T 5. PCHIP fit p ( M ( S )) = D ( S ) . Use p ( M ) ≈ D 0.05 0.05 0 0 −0.05 −0.05 −0.1 −0.1 M M diag(A −1 ) diag(A −1 ) −0.15 −0.15 computed A −1 (ii) p(M) fit 0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000 Diagonal index of sorted M Diagonal index of sorted M [ 18 ]
Approach 2. Piecewise Cubic Hermitian Spline Interpolation (PCHIP) 0.045 diag(A −1 ) p(M) bM+c 0.04 0.035 Improvement over bM+c 0.03 0.025 0.02 3500 4000 4500 5000 Diagonal index of sorted M −0.12 0.02 diag(A −1 ) p(M) −0.13 0 bM+c −0.14 −0.02 −0.15 −0.04 −0.16 −0.06 −0.17 −0.08 diag(A −1 ) p(M) −0.18 −0.1 bM+c 0 500 1000 1500 2000 2200 2400 2600 2800 3000 Diagonal index of sorted M Diagonal index of sorted M [ 19 ]
Fitting examples nev=k=100: OLM5000, SiNa, KUU Comparing diag(A −1 ), diag(M) and diag(M_fit) Comparing diag(A −1 ), diag(M) and diag(M_fit) 0 diag(A −1 ) −0.194 M Fit p(M) −0.196 −0.05 −0.198 Diagonal element Diagonal element −0.2 diag(A −1 ) −0.1 OLM5000 M −0.202 Fit p(M) −0.204 −0.15 −0.206 −0.208 −0.2 −0.21 0 1000 2000 3000 4000 5000 0 500 1000 1500 2000 2500 Index of sorted M Index of sorted M Comparing diag(A −1 ), diag(M) and diag(M_fit) Comparing diag(A −1 ), diag(M) and diag(M_fit) 1.4 0.2 1.2 0.15 1 0.1 0.8 KUU SiNa 0.05 0.6 0 0.4 −0.05 diag(A −1 ) diag(A −1 ) 0.2 M M −0.1 Fit p(M) Fit p(M) 0 0 1000 2000 3000 4000 5000 6000 7000 0 1000 2000 3000 4000 5000 Index of sorted M Index of sorted M [ 20 ]
Recommend
More recommend