ILAS 2013 Using ILU to estimate the diagonal of the inverse of a matrix Andreas Stathopoulos, Lingfei Wu, Jesse Laeuchli College of William and Mary Vasilis Kalantzis, Stratis Gallopoulos University of Patras, Greece Acks: NSF, DOE SciDAC
The problem Given a large, N × N matrix A and a function f find trace of f ( A ) : Tr ( f ( A )) Common functions: f ( A ) = A − 1 f ( A ) = log ( A ) i A − 1 R j f ( A ) = R T Applications: UQ, Data Mining, Quantum Monte Carlo, Lattice QCD Our focus: f ( A ) = A − 1 [ 2 ]
The methods Currently all methods are based on Monte Carlo (Hutchinson 1989) If x is a vector of random Z 2 variables � 1 with probability 1 / 2 x i = − 1 with probability 1 / 2 then E ( x T A − 1 x ) = Tr ( A − 1 ) Monte Carlo Trace for i=1: n x = randZ2( N ,1) sum = sum + x T A − 1 x trace = sum / n [ 3 ]
The methods Currently all methods are based on Monte Carlo (Hutchinson 1989) If x is a vector of random Z 2 variables � 1 with probability 1 / 2 x i = − 1 with probability 1 / 2 then E ( x T A − 1 x ) = Tr ( A − 1 ) Monte Carlo Trace 2 problems for i=1: n Large number of samples x = randZ2( N ,1) How to compute x T A − 1 x sum = sum + x T A − 1 x trace = sum / n [ 4 ]
The methods Currently all methods are based on Monte Carlo (Hutchinson 1989) If x is a vector of random Z 2 variables � 1 with probability 1 / 2 x i = − 1 with probability 1 / 2 then E ( x T A − 1 x ) = Tr ( A − 1 ) Monte Carlo Trace for i=1: n x = randZ2( N ,1) sum = sum + x T A − 1 x Find quadrature x T A − 1 x Solve Ay = x with CG compute y T x with Lanczos trace = sum / n (Golub’69, Bai’95, Meurant’06,’09, Strakos’11, ...) [ 5 ]
The methods Currently all methods are based on Monte Carlo (Hutchinson 1989) If x is a vector of random Z 2 variables � 1 with probability 1 / 2 x i = − 1 with probability 1 / 2 then E ( x T A − 1 x ) = Tr ( A − 1 ) Monte Carlo Trace for i=1: n x = randZ2( N ,1) sum = sum + x T A − 1 x O ( 100 − 1000 s ) statistically independent RHS trace = sum / n Recycling (de Sturler), Deflation (Morgan, AS’07, ...) speed up Krylov methods [ 6 ]
Selecting the vectors in x T A − 1 x Random x ∈ Z N best variance for real matrices (Hutchinson 1989) 2 x = randn ( N , 1 ) worse variance than Z 2 variance depends only on diag ( A − 1 ) x = e i single large element? x = F T e i mixing of diagonal elements: (Toledo et al. 2010) F = DFT or F = Hadamard Deterministic x = H T e i , i = 1 ,..., 2 k Hadamard in natural order (Bekas et al. 2007) � 1 i ∈ C m x m i = Probing. Assumes multicolored graph (Tang et al. 2011) 0 else Random-deterministic Hierarchical Probing for lattices (A.S, J.L. 2013) [ 7 ]
Variance of the estimators Rademacher vectors x i ∈ Z N 2 s ∑ i � = j ( A − 1 Tr = 1 s ∑ s i A − 1 x i Var ( Tr ) = 2 s � ˜ A − 1 � 2 F = 2 i = 1 x T i j ) 2 Diagonal x = e j ( i ) Var ( Tr ) = N 2 i = 1 A − 1 Tr = N s ∑ s s Var ( diag ( A − 1 )) j ( i ) , j ( i ) magnitude −1 A variance [ 8 ]
Approximating diag( A − 1 ) from ILU Consider an incomplete LU of A : [ L , U ] = ILU ( A ) If U − 1 L − 1 good approximation to A − 1 then compute trace from: M = diag ( U − 1 L − 1 ) Computing M needs only one pass over L , U (Erisman, Tienny, ’75) E = U − 1 L − 1 − A − 1 In some cases, Tr ( E ) can be sufficiently close to zero However, what if | Tr ( E ) | is not small? [ 9 ]
ILU gives more info Observation: even if Tr ( E ) large, M may approximate the pattern of diag( A − 1 ) and/or E may have smaller variance Ex. small Laplacian and DW2048 Comparing traceA, traceILU and fitILU with sorted order 10 Inverse of A 9 Inverse of ILU Difference 8 7 6 Diagonal element 5 4 3 2 1 0 −1 1400 1500 1600 1700 1800 1900 2000 Sorted index of diagonal element [ 10 ]
Capture pattern better by fitting p ( M ) to diag( A − 1 ) Find p () : min � p ( M ) − diag ( A − 1 ) � on a set of m indices • Induce smoothness on M by sorting • Use m equispaced indices to capture the range of values • Compute A − 1 j j of these indices • Fit M j to A − 1 j j using MATLAB’s LinearModel.stepwise When ILU is a good preconditioner, Tr ( p ( M )) can be accurate to O(1E-3) ! [ 11 ]
Examples of fitting TOLS4000, DW2048, af23560, conf6 Comparing diag(A^−1), diag((LU)^−1) and diag((LU)^−1)_fit Comparing diag(A^−1), diag((LU)^−1) and diag((LU)^−1)_fit 0.1 Inverse of A Inverse of ILU 50 0 Fitting of ILU Inverse of A Inverse of ILU −0.1 0 Fitting of ILU Diagonal element Diagonal element −0.2 −50 −0.3 −0.4 −100 −0.5 −150 −0.6 0 500 1000 1500 2000 2500 3000 3500 4000 0 500 1000 1500 2000 Sorted index of diagonal element Sorted index of diagonal element Comparing diag(A^−1), diag((LU)^−1) and diag((LU)^−1)_fit Comparing diag(A^−1), diag((LU)^−1) and diag((LU)^−1)_fit −5 2 x 10 0.02 Inverse of A Inverse of A Inverse of ILU 1.5 Fitting of ILU Inverse of ILU 0 Imagery part of diagonal element Fitting of ILU 1 Diagonal element −0.02 0.5 −0.04 0 −0.5 −0.06 −1 −0.08 −1.5 −0.1 −2 0 0.5 1 1.5 2 2.5 −0.0549 −0.0549 −0.0549 −0.0548 −0.0548 −0.0548 −0.0548 −0.0548 −0.0548 Real part of diagonal element Sorted index of diagonal element 4 x 10 [ 12 ]
Improving on the Tr ( M ) and Tr ( p ( M )) • MC on E = M − diag ( A − 1 ) potentially smaller variance on the diagonal • MC on E 2 = p ( M ) − diag ( A − 1 ) m inversions for fitting, s − m inversions for MC further variance improvement • MC with importance sampling based on M or p ( M ) Or is traditional Hutchinson better? • MC with Z N 2 on A − 1 • MC with Z N 2 on E Depends on approximation properties of ILU [ 13 ]
AF23560, conf6 ( k c − 10 − 8 ) Experiments Monte carlo, Matrix: Bai/af23560 Monte carlo, Matrix: Bai/af23560 −680 −720 MC on diag(A −1 ) MC on diag(E)_fit MC on A^−1 −700 MC on diag(E) MC on E MC on diag(E2) −740 Trace of A^−1 −720 −1 Trace of A −740 Trace estimate Trace estimate −760 −760 −780 −780 −800 −800 −820 −840 −820 0 200 400 600 800 1000 0 200 400 600 800 1000 Number of samples Number of samples Monte carlo, Matrix: conf6.0−QCD Monte carlo, Matrix: conf6.0−QCD MC on diag(E2) −126.32 −126.375 MC on diag(E) MC on E −126.34 −1 Trace of A −126.376 −126.36 Trace estimate Trace estimate −126.38 −126.377 −126.4 MC on diag(A −1 ) −126.42 MC on diag(E) −126.378 −126.44 MC on diag(E2) MC on A −1 −126.46 −126.379 MC on E −126.48 −1 Trace of A 0 200 400 600 800 1000 0 200 400 600 800 1000 Number of samples Number of samples [ 14 ]
TOLS4000 ILU( A + 10 I ), DW2048 ILU ( A + 0 . 01 I ) Experiments Monte carlo, Matrix: Bai/tols4000 Monte carlo, Matrix: Bai/tols4000 −60 −116 −80 −118 −100 −120 −120 Trace estimate −122 Trace estimate −140 −124 −160 −126 MC on diag(A^−1) −180 MC on diag(E) −128 −200 MC on diag(E2) MC on diag(E2) −130 MC on A^−1 MC on A^−1 −220 MC on E MC on E −132 Trace of A^−1 −240 Trace of A^−1 −134 200 400 600 800 1000 0 200 400 600 800 1000 1200 Number of samples Number of samples Monte carlo, Matrix: Bai/dw2048 Monte carlo, Matrix: Bai/dw2048 7500 MC on diag(A −1 ) MC on diag(E2) 15000 MC on A −1 MC on diag(E) 7000 MC on diag(E2) MC on E 6500 −1 MC on A −1 Trace of A 10000 MC on E 6000 Trace estimate Trace estimate −1 Trace of A 5500 5000 5000 4500 0 4000 3500 −5000 3000 200 400 600 800 1000 200 400 600 800 1000 Number of samples Number of samples [ 15 ]
Fitting allows ILU( A + σ I ) Experiments Often ILU on A is not possible, ill-conditioned, or too expensive Better results if we use a better conditioned ILU( A + σ I ) and allow the fitting to fix the diagonal Matrix tols4000: Statistical error varying shifts 2 10 0 10 Standard derivation −2 10 −4 10 Std(diag(E2)) Std(diag(A −1 )) Std(diag(E)) −6 Std(A −1 ) 10 Std(E) −10 −5 0 10 10 10 Shifts [ 16 ]
Sometimes Z N Experiments 2 better QCD matrix (49K) close to k c EPB3 Monte carlo, Matrix: matb5 Monte carlo, Matrix: epb3 6 x 10 MC on diag(E2) MC on diag(E2) 2.3 MC on A −1 MC on A −1 −9015 2.28 MC on E MC on E 2.26 −9020 Trace estimate Trace estimate 2.24 2.22 −9025 2.2 2.18 −9030 2.16 −9035 2.14 2.12 −9040 0 200 400 600 800 1000 0 200 400 600 800 1000 Number of samples Number of samples Here E had smaller off diagonal variance — Not easily predictable by theory [ 17 ]
Dynamically identifying smallest variance For every fitting point i = 1 ,..., m Compute a i = A − 1 e i • Based on a ii = A − 1 ii update estimates for var(diag(A)) var(diag(E)) ( a ii − M i ) var(diag(E2)) ( a ii − p ( M i ) ) • Use � a i � 2 − a 2 ii to update estimate for var(MC on A ) = � A � 2 F • Compute µ i = U − 1 L − 1 e i and update estimate for var(MC on E ) ( � a i − µ i � 2 − a 2 ii − µ 2 ii ) Large differences in various methods would show after a few points [ 18 ]
Recommend
More recommend