PMAA July 4, 2014 Accurate computation of smallest singular values using the PRIMME eigensolver Lingfei Wu Andreas Stathopoulos Computer Science Department College of William and Mary Acknowledgment: National Science Foundation, DOE Scidac
The ubiquitous SVD Mostly for largest singular values • Graph/Data/Text mining • Image-processing • Model reduction in PDEs • Variance reduction in Monte Carlo [ 2 ]
The ubiquitous SVD Mostly for largest singular values • Graph/Data/Text mining • Image-processing • Model reduction in PDEs • Variance reduction in Monte Carlo But also for smallest singular values • similar applications but for functions of matrices (e.g., A − 1 ). low rank preconditioning variance reduction • computation of pseudospectrum • sensitivity analysis [ 3 ]
Solving the large, sparse SVD problem Find k smallest singular values and corresponding left and right singular vectors of large, sparse A ∈ ℜ m × n Av i = σ i u i , σ 1 ≤ ... ≤ σ k Solution approaches: • A Hermitian eigenvalue problem on – Normal equations matrix C = A T A or C = AA T � 0 A T � – Augmented matrix B = A 0 • Lanczos bidiagonalization method (LBD) A = PB d Q T , where B d = X Σ Y T , U = PX and V = QY • Jacobi-Davidson for SVD (JDSVD) uses B but projection based on two subspaces [ 4 ]
Differences between methods Asymptotic Krylov Theorem. A Krylov method computing σ 2 n on C is twice as fast asymptotically than for computing σ n on B Theorem. A Krylov method computing σ 2 1 on C is always faster asymptotically than for computing σ 1 on B . [ 5 ]
Differences between methods Krylov degree Lanczos on C builds: V k = K k ( A T A , v 1 ) LBD builds: V k = K k ( A T A , v 1 ) , U k = K k ( AA T , Av 1 ) JDSVD builds: U k = K k / 2 ( AA T , u 1 ) ⊕ K k / 2 ( AA T , Av 1 ) V k = K k / 2 ( A T A , v 1 ) ⊕ K k / 2 ( A T A , A T u 1 ) Lanczos on B builds K k ( B , [ v 1 ; u 1 ]) . If u 1 = 0, � U k � K k / 2 ( AA T , Av 1 ) � � � � 0 = ⊕ . K k / 2 ( A T A , v 1 ) V k 0 LBD and Lanczos on C twice the degree of augmented approaches [ 6 ]
Differences between methods Accuracy Squaring C = A T A (even implicitly) means problems achieves � r � of O ( � A � ε mach ) LBD on A achieves � r � of O ( � A � ε mach ) Eigenmethod on B achieves � r � only up to O ( κ ( A ) � A � ε mach ) Eigenmethod on C Only LBD seems to be both efficient and accurate... [ 7 ]
LBD Advantages of LBD: • Same space as Lanczos on C but accurate since working on A • Lanczos global convergence to many singular triplets Drawbacks of LBD: • Orthogonality loss, large memory demands ⇒ restart • Interior spectral info ⇒ Harmonic/refined procedures ⇒ irregular and slow convergence • No general purpose software for recent advances in LBD • Cannot use preconditioning [ 8 ]
JDSVD Advantages of JDSVD: • Can take advantage of preconditioning • More flexible that eigenmethods on B • Accurate Drawbacks of JDSVD: • Correction equation on B maximally indefinite • Outer iteration slow • Only MATLAB implementation [ 9 ]
What is missing in real world software Extremely challenging task for small SVs: • large sparse matrix ⇒ no Shift-Invert • slow convergence ⇒ effective restarting and preconditioning • Currently only unpreconditioned SVD software: – SVDPACK: Lanczos and tracemin on B or C only largest SVs – PROPACK: LBD for largest SVs, Shift-Invert for smallest SVs – SLEPc: no preconditioning, still in development ⇒ calls for full functionality, highly-optimized SVD solver PRIMME: PReconditioned Iterative MultiMethod Eigensolver [ 10 ]
Building an SVD solver on PRIMME PRIMME: PReconditioned Iterative MultiMethod Eigensolver • Over 12 eigenmethods including near optimal GD+k and JDQMR methods • Extensive support for interior eigenvalues Accepts initial guesses for all required eigenvectors Accepts many shifts and finds the closest eigenvalue to each shift • Accepts any preconditioner for M ≈ C − 1 or M ≈ B − 1 , or � 0 � M if M ≈ A − 1 , uses MM T ≈ C − 1 and ≈ B − 1 M T 0 � � 0 AM if M ≈ ( A T A ) − 1 (e.g., RIF), uses MM T ≈ C − 1 and ≈ B − 1 MA T 0 • Robust framework: subspace acceleration, locking, block methods • Parallel, high performance implementation [ 11 ]
Methods to use. Why not LBD? pde2961: smallest singular values without restarting 2 10 GD on C GD on B 0 10 JDSVD LANSVD ||A||*1e−10 −2 10 ||A T u − σ v|| −4 10 −6 10 −8 10 −10 10 0 500 1000 1500 2000 2500 3000 3500 Matvecs [ 12 ]
Restarting changes the game pde2961: smallest singular values with restarting 2 10 GD+1 on C GD+1 on B 0 JDSVD+1−Refined 10 IRRHLB ||A||*1e−10 −2 10 ||A T u − σ v|| −4 10 −6 10 −8 10 −10 10 0 1000 2000 3000 4000 5000 6000 7000 Matvecs [ 13 ]
primme svds: A two-stage strategy • GD+k on C extremal evs and near optimal restart best for a few smallest SVs • or JDQMR on C if A is very sparse • Limited by accuracy Our solution: a hybrid, two-stage singular value method � σ i δ user � A � , � A � 2 ε mach � • Stage I: work on C to residual tolerance max • Stage II: work on B to improve the approximations from C to δ user � A � [ 14 ]
primme svds: an example A=diag([1:10 1000:100:1e6]), Prec=A+rand(1,10000)*1e4 5 OAAO 10 ATA to limit 2 ATA to eps||A|| switch to OAAO 0 10 ||A T u − σ v|| −5 10 −10 10 0 500 1000 1500 2000 2500 3000 Matvecs [ 15 ]
Stage I of primme svds ( C ) • Flexible stopping criterion Let ( ˜ σ , ˜ u , ˜ v ) be a targeted singular triplet of A � ˜ � ˜ � � v v u , r u = A T ˜ σ 2 ˜ r v = A ˜ v − ˜ σ ˜ u − ˜ σ ˜ v , r C = C ˜ v − ˜ v , r B = B − ˜ σ . ˜ ˜ u u If � v i � = 1, � u i � = � Av i / σ i � = 1, then r v = 0 and √ � r u � = � r C � = � r B � 2 σ ˜ Thus, the stopping criterion for the methods on C becomes, δ C = max ( δ user ˜ σ / � A � , ε mach ) • Explicit Rayleigh-Ritz of converged eigenpairs Improves directions of individual eigenvectors [ 16 ]
Stage II of primme svds ( B ) Excellent initial guesses (shifts, eigenvectors) from C ⇒ Use JDQMR Subspace acceleration and optimal stopping in JDQMR ⇒ Better than iterative refinment Irregular convergence of Rayleigh Ritz (RR) on B ⇒ Enhance PRIMME with Refined Projection (min � BVy − ˜ σ Vy � ) QR only column update per step QR factorization only at restart [ 17 ]
Implementation of primme svds • Developed PRIMME MEX, a MATLAB interface for PRIMME • UI extends MATLAB’s eigs() and svds() to PRIMME’s full functionality • External stopping criterion, refined projection implemented in PRIMME [ 18 ]
Evaluation: Test matrices Rectangular Square Matrix well1850 plddb lp bnl2 Matrix dw2048 fidap4 jagmesh8 lshp3025 rows m : 1850 3049 2324 order 2048 1601 1141 3025 cols n : 712 5069 4486 κ ( A ) 5.3e3 5.2e3 5.9e4 2.2e5 κ ( A ) 1.1e2 1.2e4 7.8e3 � A � 2 1.0e0 1.6e0 6.8e0 7.0e0 � A � 2 1.8e0 1.4e2 2.1e2 We compare against: • JDSVD (Hochstenbach, 2001) • IRRHLB (Jia, 2010) • SVDIFP (Ye, 2014) • IRLBA (Baglama, 2005) • MATLAB svds() (ARPACK, 1998) [ 19 ]
No preconditioning — first stage only — 1 or 10 smallest triplets 1 smallest w/o preconditioning, tol = 1e−8 10 smallest w/o preconditioning, tol = 1e−8 4 4 7 x 10 x 10 primme_svds primme_svds 8 IRRHLB IRRHLB 6 Number of Matrix−Vectors Number of Matrix−Vectors JDSVD JDSVD 5 IRLBA IRLBA 6 SVDIFP SVDIFP 4 4 3 2 2 1 0 0 lshp3025 jagmesh8 dw2048 lpbn12 plddb well1850 lshp3025 jagmesh8 dw2048 lpbn12 plddb well1850 primme svds: fastest and most robust [ 20 ]
No preconditioning — 1e-14 accuracy — 1 or 10 smallest triplets 1 smallest w/o preconditioning, tol = 1e−14 4 10 smallest w/o preconditioning, tol = 1e−14 x 10 4 10 x 10 primme_svds(gd+k) primme_svds(gd+k) 8 primme_svds(jdqmr) Number of Matrix−Vectors primme_svds(jdqmr) Number of Matrix−Vectors IRRHLB 8 IRRHLB JDSVD JDSVD 6 IRLBA IRLBA 6 SVDIFP SVDIFP 4 4 2 2 0 0 lshp3025 jagmesh8 dw2048 lpbn12 plddb well1850 5 8 8 2 b 0 2 h 4 1 d 5 0 s 0 n d 8 3 e 2 b l 1 p m p w p l l e h l g d w s a l j primme svds: fastest and most robust at high accuracy [ 21 ]
With preconditioning — 1e-8/1e-14 — 1 smallest triplets 10 smallest with P = ilutp(1e−3) or RIF(1e−3), tol = 1e−8 10 smallest with P = ilutp(1e−3) or RIF(1e−3), tol = 1e−14 2500 4000 primme_svds primme_svds(gd+k) JDSVD primme_svds(jdqmr) Number of Matrix−Vectors Number of Matrix−Vectors 2000 SVDIFP JDSVD 3000 SVDIFP 1500 2000 1000 1000 500 0 0 4 8 5 4 b 2 fidap4 jagmesh8 lshp3025 deter4 plddb lpbnl2 p h 2 r d l e n a s 0 d t b e d 3 e l p p i m p d f l h g s a l j primme svds: significantly more robust and efficient Note JDSVD’s problem with rectangular matrices [ 22 ]
With shift and invert operator Shift and Invert technique for 10 smallest 1200 primme_svds(C −1 ) Number of Matrix−Vectors −1 ) 1000 MATLAB svds(B svdifp(C −1 ) 800 600 400 200 0 4 8 5 4 b 2 p h 2 r d l n e 0 a s d b t d e 3 l e p p i m p d f l h g s a l j primme svds on C is faster than MATLAB svds [ 23 ]
More recommend