a matlab interface for primme for solving eigenvalue and
play

A MATLAB interface for PRIMME for solving Eigenvalue and Singular - PowerPoint PPT Presentation

SIAM CSE 2013 A MATLAB interface for PRIMME for solving Eigenvalue and Singular Value problems Andreas Stathopoulos and Lingfei Wu Computer Science Department College of William and Mary Acknowledgment: National Science Foundation, DOE SciDAC


  1. SIAM CSE 2013 A MATLAB interface for PRIMME for solving Eigenvalue and Singular Value problems Andreas Stathopoulos and Lingfei Wu Computer Science Department College of William and Mary Acknowledgment: National Science Foundation, DOE SciDAC

  2. The problems A large, sparse, Hermitian matrix: Find nev eigenvalues and corresponding eigenvectors Ax i = λ i x i A large, sparse, N × M matrix Find nev singular values and corresponding left and right singular vectors Av i = σ i u i SVD problem solved as a Hermitian eigenvalue problem on Normal equations A T A or AA T , Augmented matrix [ 0 A ; A T 0 ] [ 2 ]

  3. Available software Lanczos-based, no preconditioning • ARPACK (Sorensen, Lehoucq) • TRLAN (Wu, Simon) • Industrial strength Lanczos (Grimes, Lewis, Simon) Preconditioned eigensolver packages with various methods • ANASAZI (Baker, Thornquist, Lehoucq, Hetmaniuk) • PRIMME (AS, J.M.) • SLEPc (Roman et al.) Specific method preconditioned eigensolver software • BLOPEX (Knyazev) • JADAMILU (Bollhoefer, Notay) [ 3 ]

  4. A.S. and J. R. McCombs (2006) PRIMME: PReconditioned Iterative MultiMethod Eigensolver • Near optimality through GD+k and JDQMR methods • Over 12 methods accessible through PRIMME. • Dynamic choice between best methods • Block versions of methods • Interior eigenvalues too • Full set of defaults for non expert users • Full customizability for expert users • Parallel, high performance implementation • C and Fortran interfaces, real and complex • Accessible also in SLEPc Download: www.cs.wm.edu/ ∼ andreas [ 4 ]

  5. PRIMME shown robust and efficient Matvec ratios over PRIMME’s GD+1 Time ratios over JDqmr 10 PRIMME JDQMR PRIMME JDQMR 12 9 PRIMME GD+k PRIMME GD+k Anasazi IRTR Anasazi IRTR 8 Anasazi LOBPCG Anasazi LOBPCG 10 Anasazi BD Anasazi BD 7 8 6 5 6 4 4 3 2 2 1 0 0 Andrews Lap7p1M Plate33K cfd1 or56f Andrews Lap7p1M Plate33K cfd1 or56f Typically: GD+1 smallest number of matrix-vector ops JDMQR lowest time (if matrix sparse enough) [ 5 ]

  6. Dynamic method chooses between the fastest two algorithms Matrix Fillet_13K: Dynamic matches best method GD+k JDQMR Dynamic 2 10 Time in seconds 1 10 0 5 10 15 20 25 30 35 40 45 50 Number of smallest eigenvalues found [ 6 ]

  7. PRIMME multi-layer interface – End user #include "primme.h" primme_params primme; primme_Initialize(&primme); primme.n = n; primme.numEvals = 20; primme.matrixMatvec = MV(x,y,k) primme.applyPreconditioner = PR(x,y,k) ierr = dprimme(evals, evecs, rnorms, &primme); Usually achieves full potential of the method [ 7 ]

  8. PRIMME multi-layer interface – Advanced user #include "primme.h" primme_params primme; primme. outputFile = stdout iseed = -1 printLevel = 5 restarting.scheme = primme_thick numEvals = 10 restarting.maxPrevRetain = 1 aNorm = 1.0 correction.precondition = 1 eps = 1.0e-12 correction.robustShifts = 1 maxBasisSize = 15 correction.maxInnerIterations = -1 minRestartSize = 7 correction.relTolBase = 1.5 maxBlockSize = 1 correction.convTest = adaptive_ETolerance maxOuterIterations = 10000 correction.projectors.LeftQ = 1 maxMatvecs = 300000 correction.projectors.LeftX = 1 target = primme_smallest correction.projectors.RightQ = 0 numTargetShifts = 0 correction.projectors.SkewQ = 0 targetShifts = 1.0 2.0 correction.projectors.RightX = 1 locking = 1 correction.projectors.SkewX = 1 initSize = 0 matrixMatvec = MV(x,y,k) numOrthoConst = 0; applyPreconditioner = PR(x,y,k) ierr = dprimme(evals, evecs, rnorms, &primme); [ 8 ]

  9. PRIMME in MATLAB Benefits to PRIMME users • Optimized Sparse, Block Matrix-Vector (MV) function • Optimized libraries for Sparse matrix inversion and ILU preconditioing • Optimized BLAS/LAPACK libraries • Ease of use/development • Easier to build and experiment on a SVD solver Benefits to MATLAB users • Availability of a preconditioned eigensolver • Availability of a preconditioned singular value solver • As robust and easy to use as eigs() but much faster [ 9 ]

  10. Interface similar to eigs() [evals] = [evecs, evals] = [evecs, evals, resnorms] = [evecs, evals, resnorms, primmeStats] = primme_eigs(A) primme_eigs(A, numEvals) primme_eigs(A, numEvals, target) primme_eigs(A, numEvals, target, opts) primme_eigs(A, numEvals, target, opts, eigsMethod) primme_eigs(A, numEvals, target, opts, eigsMethod, P) primme_eigs(A, numEvals, target, opts, eigsMethod, P1,P2) primme_eigs(A, numEvals, target, opts, eigsMethod, Pfun) primme_eigs(Afun, dim,...) We allow Afun even for smallest magnitude eigenvalues [ 10 ]

  11. PRIMME vs PRIMME MEX vs eigs() No preconditioner • Five smallest eigenvalues — eigs(Afun) to avoid inversion • PRIMME compiled -O3 including SparseMatVec,BLAS,LAPACK • PRIMME MEX uses MATLAB’s SparseMatVec, BLAS, LAPACK Time ratios over MEX JDqmr 3 PRIMME JDQMR PRIMME GD+k PRIMME MEX JDQMR 2.5 PRIMME MEX GD+k eigs() 2 • MATLAB Library benefits 1.5 • PRIMME MEX faster than 1 eigs() 0.5 0 s A K 2 1 2 w e 1 d 3 e 5 f v e 3 c n a n r e o d w a t n C a n l a A l i P f r t c e p s D 3 [ 11 ]

  12. Performance of MATLAB’s Block Sparse Matvec Time(k MVs) / Time(block k MV) 2.8 2.6 2.4 2.2 Time ratio 2 1.8 Andrews 1.6 Cone_A finan512 1.4 Plate33K_A0 1.2 cfd1 3Dspectralwave2 1 0 5 10 15 20 Blocksize Block size of 2 biggest gain. Larger block no much better. [ 12 ]

  13. PRIMME MEX with block size of 2 In Block Krylov methods # Matvecs increase almost linearly with block size Using the MATLAB routines, block=2 offers better robustness with only small additional expense PRIMME mex: Time(block=k)/Time(block=1) PRIMME mex: Time(block=k)/Time(block=1) 2 2.2 block = 2 block = 2 block = 3 block = 3 2 1.8 block = 4 block = 4 Ratio over non−blocking Ratio over non−blocking 1.8 1.6 1.6 1.4 1.4 1.2 1.2 1 1 0.8 5 10 15 20 25 5 10 15 20 25 Number of eigenvalues of 3dspectralwave2 Number of eigenvalues of Andrews [ 13 ]

  14. Using PRIMME for SVD • Allow access to full functionality of PRIMME • Allow choice of normal equations (ATA) or augmented matrix B (OAAO) • Allow for various preconditioning techniques: A − 1 � � 0 – P ≈ ( A T A ) − 1 or P ≈ B − 1 = directly A − T 0 – P ≈ A − 1 explicitly or through ILU: P = U − 1 L − 1 ≈ A − 1 . Use as: � 0 P � PP T ≈ ( A T A ) − 1 or as ≈ B − 1 P T 0 (matrix products not formed) – Pfun user provided function for implementing any of the above P [ 14 ]

  15. Using PRIMME for SVD [S] = [U, S, V] = [U, S, V, norms, primmeout] = primme_svds(A) primme_svds(A, numSvs) primme_svds(A, numSvs, target) primme_svds(A, numSvs, target, opts) primme_svds(A, numSvs, target, opts, eigsMethod) primme_svds(A, numSvs, target, opts, eigsMethod, svdsMethod) primme_svds(A, numSvs, target, opts, eigsMethod, svdsMethod, P) primme_svds(A, numSvs, target, opts, eigsMethod, svdsMethod, P1,P2) primme_svds(A, numSvs, target, opts, eigsMethod, svdsMethod, Pfun) primme_svds(Afun, M, N,...) [ 15 ]

  16. What threshold to converge to? Let ( u i , σ i , v i ) approximate singular triplet. Define: r v = � Av i − σ i u i � r u = � A T u i − σ i v i � r ATA = � A T Av i − σ 2 i v i � � v � v � � r B = � B − σ i � u u If � v i � = 1, � u i � = � Av i / σ i � = 1, then r v = 0 and √ r u = r ATA = r B 2 σ i Want to guarantee r u < δ � A � , so converge each method to r ATA < σ i δ � A � √ r B < 2 δ � A � [ 16 ]

  17. Which SVD method? Convergence speed issue very fast for largest SVs (squared gaps) ATA slow for smallest but still much faster than OAAO slower for largest eigenvalues OAAO extremely slow and not robust for smallest (interior) SVs Accuracy issue can converge up to � A � ε mach OAAO can only converge up to � A � 2 ε mach ATA ⇒ for small σ i cannot reach needed σ i δ � A � , if δ < ε mach � A � / σ i Our PRIMME SVDS solution: � σ i δ � A � , � A � 2 ε mach � Use ATA to residual threshold max Use OAAO to improve the ATA approximations to the required threshold [ 17 ]

  18. Accuracy limit and dynamic switching A=diag([1:10 1000:100:1e6]), Prec=A+rand(1,10000)*1e4 5 OAAO 10 ATA to limit 2 ATA to eps||A|| switch to OAAO 0 10 ||A T u − σ v|| −5 10 −10 10 0 500 1000 1500 2000 2500 3000 Matvecs [ 18 ]

  19. Performance of primme svds small matrices Time of svds(B −1 ) / primme_svds(A −1 A −T ) 6 1 4 10 numSvals 5 Ratio 4 3 2 1 8 0 9 4 r 6 4 0 1 o 9 0 0 − l 2 2 2 c y a e w m s d d w p b When A = LU or A ≈ LU , store L T , U T for faster memory access. Still less memory than svds inverting B . [ 19 ]

  20. Performance of primme svds medium size matrices — 1 smallest sv Larger matrices where inversion possible −− nsvs = 1 3 10 SVDS(B −1 ) primme_svds(A −1 A −T ) 2 10 primme_svds(ILU precond) Seconds 1 10 0 10 −1 10 cage11 finan512 torso3 wang3 wang4 chipcool0 κ =12 κ =98 κ =5.8e2 κ =6.2e3 κ =4.0e5 κ =8.3e6 N=39K N=75K N=259K N=26K N=26K N=20K Shift-invert speedup factor 4 When preconditioning effective speedup 1000. [ 20 ]

Recommend


More recommend