pmaa
play

PMAA July 4, 2014 Accurate computation of smallest singular values - PowerPoint PPT Presentation

PMAA July 4, 2014 Accurate computation of smallest singular values using the PRIMME eigensolver Lingfei Wu Andreas Stathopoulos Computer Science Department College of William and Mary Acknowledgment: National Science Foundation, DOE Scidac


  1. PMAA July 4, 2014 Accurate computation of smallest singular values using the PRIMME eigensolver Lingfei Wu Andreas Stathopoulos Computer Science Department College of William and Mary Acknowledgment: National Science Foundation, DOE Scidac

  2. The ubiquitous SVD Mostly for largest singular values • Graph/Data/Text mining • Image-processing • Model reduction in PDEs • Variance reduction in Monte Carlo [ 2 ]

  3. The ubiquitous SVD Mostly for largest singular values • Graph/Data/Text mining • Image-processing • Model reduction in PDEs • Variance reduction in Monte Carlo But also for smallest singular values • similar applications but for functions of matrices (e.g., A − 1 ). low rank preconditioning variance reduction • computation of pseudospectrum • sensitivity analysis [ 3 ]

  4. Solving the large, sparse SVD problem Find k smallest singular values and corresponding left and right singular vectors of large, sparse A ∈ ℜ m × n Av i = σ i u i , σ 1 ≤ ... ≤ σ k Solution approaches: • A Hermitian eigenvalue problem on – Normal equations matrix C = A T A or C = AA T � 0 A T � – Augmented matrix B = A 0 • Lanczos bidiagonalization method (LBD) A = PB d Q T , where B d = X Σ Y T , U = PX and V = QY • Jacobi-Davidson for SVD (JDSVD) uses B but projection based on two subspaces [ 4 ]

  5. Differences between methods Asymptotic Krylov Theorem. A Krylov method computing σ 2 n on C is twice as fast asymptotically than for computing σ n on B Theorem. A Krylov method computing σ 2 1 on C is always faster asymptotically than for computing σ 1 on B . [ 5 ]

  6. Differences between methods Krylov degree Lanczos on C builds: V k = K k ( A T A , v 1 ) LBD builds: V k = K k ( A T A , v 1 ) , U k = K k ( AA T , Av 1 ) JDSVD builds: U k = K k / 2 ( AA T , u 1 ) ⊕ K k / 2 ( AA T , Av 1 ) V k = K k / 2 ( A T A , v 1 ) ⊕ K k / 2 ( A T A , A T u 1 ) Lanczos on B builds K k ( B , [ v 1 ; u 1 ]) . If u 1 = 0, � U k � K k / 2 ( AA T , Av 1 ) � � � � 0 = ⊕ . K k / 2 ( A T A , v 1 ) V k 0 LBD and Lanczos on C twice the degree of augmented approaches [ 6 ]

  7. Differences between methods Accuracy Squaring C = A T A (even implicitly) means problems achieves � r � of O ( � A � ε mach ) LBD on A achieves � r � of O ( � A � ε mach ) Eigenmethod on B achieves � r � only up to O ( κ ( A ) � A � ε mach ) Eigenmethod on C Only LBD seems to be both efficient and accurate... [ 7 ]

  8. LBD Advantages of LBD: • Same space as Lanczos on C but accurate since working on A • Lanczos global convergence to many singular triplets Drawbacks of LBD: • Orthogonality loss, large memory demands ⇒ restart • Interior spectral info ⇒ Harmonic/refined procedures ⇒ irregular and slow convergence • No general purpose software for recent advances in LBD • Cannot use preconditioning [ 8 ]

  9. JDSVD Advantages of JDSVD: • Can take advantage of preconditioning • More flexible that eigenmethods on B • Accurate Drawbacks of JDSVD: • Correction equation on B maximally indefinite • Outer iteration slow • Only MATLAB implementation [ 9 ]

  10. What is missing in real world software Extremely challenging task for small SVs: • large sparse matrix ⇒ no Shift-Invert • slow convergence ⇒ effective restarting and preconditioning • Currently only unpreconditioned SVD software: – SVDPACK: Lanczos and tracemin on B or C only largest SVs – PROPACK: LBD for largest SVs, Shift-Invert for smallest SVs – SLEPc: no preconditioning, still in development ⇒ calls for full functionality, highly-optimized SVD solver PRIMME: PReconditioned Iterative MultiMethod Eigensolver [ 10 ]

  11. Building an SVD solver on PRIMME PRIMME: PReconditioned Iterative MultiMethod Eigensolver • Over 12 eigenmethods including near optimal GD+k and JDQMR methods • Extensive support for interior eigenvalues Accepts initial guesses for all required eigenvectors Accepts many shifts and finds the closest eigenvalue to each shift • Accepts any preconditioner for M ≈ C − 1 or M ≈ B − 1 , or � 0 � M if M ≈ A − 1 , uses MM T ≈ C − 1 and ≈ B − 1 M T 0 � � 0 AM if M ≈ ( A T A ) − 1 (e.g., RIF), uses MM T ≈ C − 1 and ≈ B − 1 MA T 0 • Robust framework: subspace acceleration, locking, block methods • Parallel, high performance implementation [ 11 ]

  12. Methods to use. Why not LBD? pde2961: smallest singular values without restarting 2 10 GD on C GD on B 0 10 JDSVD LANSVD ||A||*1e−10 −2 10 ||A T u − σ v|| −4 10 −6 10 −8 10 −10 10 0 500 1000 1500 2000 2500 3000 3500 Matvecs [ 12 ]

  13. Restarting changes the game pde2961: smallest singular values with restarting 2 10 GD+1 on C GD+1 on B 0 JDSVD+1−Refined 10 IRRHLB ||A||*1e−10 −2 10 ||A T u − σ v|| −4 10 −6 10 −8 10 −10 10 0 1000 2000 3000 4000 5000 6000 7000 Matvecs [ 13 ]

  14. primme svds: A two-stage strategy • GD+k on C extremal evs and near optimal restart best for a few smallest SVs • or JDQMR on C if A is very sparse • Limited by accuracy Our solution: a hybrid, two-stage singular value method � σ i δ user � A � , � A � 2 ε mach � • Stage I: work on C to residual tolerance max • Stage II: work on B to improve the approximations from C to δ user � A � [ 14 ]

  15. primme svds: an example A=diag([1:10 1000:100:1e6]), Prec=A+rand(1,10000)*1e4 5 OAAO 10 ATA to limit 2 ATA to eps||A|| switch to OAAO 0 10 ||A T u − σ v|| −5 10 −10 10 0 500 1000 1500 2000 2500 3000 Matvecs [ 15 ]

  16. Stage I of primme svds ( C ) • Flexible stopping criterion Let ( ˜ σ , ˜ u , ˜ v ) be a targeted singular triplet of A � ˜ � ˜ � � v v u , r u = A T ˜ σ 2 ˜ r v = A ˜ v − ˜ σ ˜ u − ˜ σ ˜ v , r C = C ˜ v − ˜ v , r B = B − ˜ σ . ˜ ˜ u u If � v i � = 1, � u i � = � Av i / σ i � = 1, then r v = 0 and √ � r u � = � r C � = � r B � 2 σ ˜ Thus, the stopping criterion for the methods on C becomes, δ C = max ( δ user ˜ σ / � A � , ε mach ) • Explicit Rayleigh-Ritz of converged eigenpairs Improves directions of individual eigenvectors [ 16 ]

  17. Stage II of primme svds ( B ) Excellent initial guesses (shifts, eigenvectors) from C ⇒ Use JDQMR Subspace acceleration and optimal stopping in JDQMR ⇒ Better than iterative refinment Irregular convergence of Rayleigh Ritz (RR) on B ⇒ Enhance PRIMME with Refined Projection (min � BVy − ˜ σ Vy � ) QR only column update per step QR factorization only at restart [ 17 ]

  18. Implementation of primme svds • Developed PRIMME MEX, a MATLAB interface for PRIMME • UI extends MATLAB’s eigs() and svds() to PRIMME’s full functionality • External stopping criterion, refined projection implemented in PRIMME [ 18 ]

  19. Evaluation: Test matrices Rectangular Square Matrix well1850 plddb lp bnl2 Matrix dw2048 fidap4 jagmesh8 lshp3025 rows m : 1850 3049 2324 order 2048 1601 1141 3025 cols n : 712 5069 4486 κ ( A ) 5.3e3 5.2e3 5.9e4 2.2e5 κ ( A ) 1.1e2 1.2e4 7.8e3 � A � 2 1.0e0 1.6e0 6.8e0 7.0e0 � A � 2 1.8e0 1.4e2 2.1e2 We compare against: • JDSVD (Hochstenbach, 2001) • IRRHLB (Jia, 2010) • SVDIFP (Ye, 2014) • IRLBA (Baglama, 2005) • MATLAB svds() (ARPACK, 1998) [ 19 ]

  20. No preconditioning — first stage only — 1 or 10 smallest triplets 1 smallest w/o preconditioning, tol = 1e−8 10 smallest w/o preconditioning, tol = 1e−8 4 4 7 x 10 x 10 primme_svds primme_svds 8 IRRHLB IRRHLB 6 Number of Matrix−Vectors Number of Matrix−Vectors JDSVD JDSVD 5 IRLBA IRLBA 6 SVDIFP SVDIFP 4 4 3 2 2 1 0 0 lshp3025 jagmesh8 dw2048 lpbn12 plddb well1850 lshp3025 jagmesh8 dw2048 lpbn12 plddb well1850 primme svds: fastest and most robust [ 20 ]

  21. No preconditioning — 1e-14 accuracy — 1 or 10 smallest triplets 1 smallest w/o preconditioning, tol = 1e−14 4 10 smallest w/o preconditioning, tol = 1e−14 x 10 4 10 x 10 primme_svds(gd+k) primme_svds(gd+k) 8 primme_svds(jdqmr) Number of Matrix−Vectors primme_svds(jdqmr) Number of Matrix−Vectors IRRHLB 8 IRRHLB JDSVD JDSVD 6 IRLBA IRLBA 6 SVDIFP SVDIFP 4 4 2 2 0 0 lshp3025 jagmesh8 dw2048 lpbn12 plddb well1850 5 8 8 2 b 0 2 h 4 1 d 5 0 s 0 n d 8 3 e 2 b l 1 p m p w p l l e h l g d w s a l j primme svds: fastest and most robust at high accuracy [ 21 ]

  22. With preconditioning — 1e-8/1e-14 — 1 smallest triplets 10 smallest with P = ilutp(1e−3) or RIF(1e−3), tol = 1e−8 10 smallest with P = ilutp(1e−3) or RIF(1e−3), tol = 1e−14 2500 4000 primme_svds primme_svds(gd+k) JDSVD primme_svds(jdqmr) Number of Matrix−Vectors Number of Matrix−Vectors 2000 SVDIFP JDSVD 3000 SVDIFP 1500 2000 1000 1000 500 0 0 4 8 5 4 b 2 fidap4 jagmesh8 lshp3025 deter4 plddb lpbnl2 p h 2 r d l e n a s 0 d t b e d 3 e l p p i m p d f l h g s a l j primme svds: significantly more robust and efficient Note JDSVD’s problem with rectangular matrices [ 22 ]

  23. With shift and invert operator Shift and Invert technique for 10 smallest 1200 primme_svds(C −1 ) Number of Matrix−Vectors −1 ) 1000 MATLAB svds(B svdifp(C −1 ) 800 600 400 200 0 4 8 5 4 b 2 p h 2 r d l n e 0 a s d b t d e 3 l e p p i m p d f l h g s a l j primme svds on C is faster than MATLAB svds [ 23 ]

More recommend