Modern Krylov subspace methods (and applications to parabolic control problems) Daniel B. Szyld Temple University, Philadelphia ————————— Scientific and Statistical Computing Seminar University of Chicago 25 October 2012 1
Collaborators Xiuhong Du, Alfred University Marcus Sarkis, Worcester Poly. Inst., and IMPA, Rio de Janeiro Christian Schaerer, Universidad Nacional de Asunci´ on Valeria Simoncini, Universit` a di Bologna 2
Plan for the talk • Introduction to Truncated Krylov subspace methods • Inexact Krylov subspace methods: Introduction and Applications • Special case of parabolic control problems 3
General Problem Statement Solve a system Hx = b, H Hermitian or non-Hermitian using Krylov subspace iterative methods 4
Krylov subspace methods K m ( H, r 0 ) = span { r 0 , Hr 0 , H 2 r 0 , . . . , H m − 1 r 0 } . Subspaces are nested: K m ⊂ K m +1 . Given x 0 , r 0 = b − Hx 0 , find approximation x m ∈ x 0 + K m ( H, r 0 ) , satisfying some property. 5
Krylov subspace methods (cont.) Conditions: Galerkin, e.g., FOM, CG: b − Hx m ⊥ K m ( H, r 0 ) Petrov-Galerkin, e.g., GMRES, MINRES: b − Hx m ⊥ H K m ( H, r 0 ) or equivalently x m = arg min {� b − Hx � 2 } , x ∈ x 0 + K m ( H, r 0 ) 6
Krylov subspace methods (cont.) Some implementation issues • Methods work by suitably choosing a basis of K m ( H, r 0 ) • Let v 1 , v 2 , . . . , v m be such a basis, chosen to be orthonormal. • One can of course run Gram-Schmidt on { r 0 , Hr 0 , H 2 r 0 , . . . , H m − 1 r 0 } (not advised). Instead, Arnoldi[1951] (for general H ) said: v 1 = r 0 normalized v 2 = Hv 1 − � v 1 , Hv 1 � v 1 normalized, span { v 1 , v 2 } = span { r 0 , Hr 0 } v 3 = Hv 2 − � v 2 , Hv 2 � v 2 − � v 1 , Hv 2 � v 1 normalized, span { v 1 , v 2 , v 3 } = span { r 0 , Hr 0 , H 2 r 0 } etc. 7
Arnoldi method Let β = � r 0 � , and v 1 = r 0 /β . For k = 1 , . . . v k +1 h k +1 ,k = Hv k − � k Compute Hv k , then j =1 v j h jk , where h jk = � v j , Hv k � , j ≤ k , and h k +1 ,k is positive and such that � v k +1 � = 1 . In practice: Modified Gram-Schmidt or Householder orthogonalization With V m = [ v 1 , v 2 , . . . , v m ] , obtain Arnoldi relation: HV m = V m +1 H m +1 ,m H m +1 ,m is ( m + 1) × m upper Hessenberg 8
Arnoldi relation (cont.) h 11 h 12 h 13 · · · h 1 m h 21 h 22 h 23 · · · h 2 m . ... ... . . H m +1 ,m = . ... ... . . h m,m − 1 h mm h m +1 ,m H m = h m +1 ,m e T m HV m = V m +1 H m +1 ,m = V m H m + h m +1 ,m v m +1 e T m 9
Krylov subspace methods (cont.) More implementation issues Element in K m ( H, v 1 ) is a linear combination of v 1 , v 2 , . . . , v m , i.e., of the form V m y , y ∈ R m Each method finds y = y m and we have x m = V m y m For FOM, or CG, we have the Galerkin condition r m ⊥ K m , i.e., 0 = V T m ( b − Hx m ) = V T m b − V T m HV m y m = βe 1 − H m y m 10
FOM: Full Orthogonalization Method [Saad, 1978] Use Arnoldi methods to construct V m , H m , Solve H m y m = βe 1 Approximation is x m = V m y m . Test for convergence: Is � r m � < ε ? Cost: one matrix-vector product per step, at step m , orthogonalization ( m inner-products), one solution of small m × m upper Hessenberg matrix (LU with no fill). ( � r m � cheap or free - no details here). Storage: at step m , m vectors v 1 , v 2 , . . . , v m . 11
GMRES implementation We use Arnoldi methods to obtain V m , and as before, element in K m ( H, v 1 ) is of the form V m y , y ∈ R m . Recall: β = � r 0 � , HV m = V m H m +1 ,m b − Hx m = r 0 − HV m y m = βv 1 − V m +1 H m +1 ,m y = V m +1 ( βe 1 − H m +1 ,m y ) � r m � = min x ∈K m � b − Hx � = min y ∈ R m � βe 1 − H m +1 ,m y � R m QR factorization H m +1 ,m = Q m +1 R m +1 ,m , R m +1 ,m = , 0 y ∈R m � Q T � r m � = min m +1 βe 1 − R m +1 ,m y � . 12
y ∈R m � Q T � r m � = min m +1 βe 1 − R m +1 ,m y � . t m Q T m +1 βe 1 = ρ m +1 Then, y m = R − 1 m t m , and x m = V m y m . Furthermore, � r m � = � Q T m +1 βe 1 − R m +1 ,m y m � = | ρ m +1 | Use this computed residual for stopping (may deviate from true residual!) Use Givens rotations for QR, save rotations from previous steps. Only two entries per step needed. 13
GMRES Cost: one matrix-vector product per step, at step m , orthogonalization ( m inner-products), rotations for QR, solution of m × m triangular system. Storage: at step m , m vectors v 1 , v 2 , . . . , v m . Residual norms monotonically nonincreasing, but stagnation possible. Superlinear convergence can be observed. 14
• Main costs: 1. Matrix-vector product: Hv k 2. Orthogonalization 3. Storage (if there is no recursion) • One popular alternative: Restarted methods. Hit and miss. Little theory. [Saad, 1996], [Simoncini, 2000] • It may happen than larger value of restart parameter is less efficient! [Embree, 2003] 15
This Talk • Consider the case when one does not fully orthogonalize: Truncated methods. • Reduce the cost of matrix-vector product when H is either – Not known exactly – Computationally expensive (e.g., Schur complement, reduced Hessian) – Preconditioned with variable matrix (i.e., iteration dependent) • Apply all this to Parabolic Control Problems • Use a Parareal-in-time approximation 16
Truncated Krylov subspace methods For the same amount of storage (max ℓ vectors), instead of restarting: Only orthogonalize with respect of the previous ℓ vectors. In Arnoldi we have then: k � v k +1 h k +1 ,k = Hv k − v j h jk , j =max { 1 ,k − ℓ +1 } where h jk = � v j , Hv k � , j ≤ k , and h k +1 ,k is positive and such that � v k +1 � = 1 . Only need to store these previous ℓ vectors. 17
Upper Hessenberg matrix is now banded (upper bandwidth ℓ ). h 11 h 12 h 13 h 21 h 22 h 23 ... ... H m +1 ,m = ... ... h m,m − 1 h mm h m +1 ,m 18
Truncated Krylov subspace methods (cont.) • Truncated GMRES [Saad and Wu, 1996] Truncated FOM [Saad, 1981], [Jia, 1996] • Basis of K m ( H, r 0 ) , v 1 , v 2 , . . . , v m ( m > ℓ ) is not orthogonal, but x m ∈ K m ( H, r 0 ) , and minimization (or Galerkin condition) is over the whole space. • H m +1 ,m banded with upper semiband ℓ − 2 . Matrix with basis vectors V m not orthogonal. Can be implemented so that only O( ℓ ) vectors are stored. • Extreme case, ℓ = 3 , H m +1 ,m tridiagonal. If H is SPD, FOM reduces to CG (and V m automatically orthogonal). • Theory for “non-optimal methods” [Simoncini and Szyld, 2005] 19
Example: L ( u ) = − u xx + − u yy + 100( x + y ) u x + 100( x + y ) u y , on [0 , 1] 2 , Dirichlet b.c., centered 5 pts. discretization, n = 2500 . 1 1 10 10 0 0 10 10 −1 −1 10 10 −2 −2 10 10 −3 −3 10 10 −4 −4 10 10 −5 −5 10 10 −6 −6 10 10 −7 −7 10 10 −8 −8 10 10 −9 10 −9 10 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 0 5 10 15 20 25 30 35 40 45 flops 6 number of its. x 10 GMRES, Truncated ℓ = 3 . 20
Inexact Krylov subspace methods • At the k th iteration of the Krylov space method use ( H + D k ) v k − 1 instead of Hv k − 1 , where � D k � can be monitored • Two examples now: - Schur complement, where the inverse is approximated - Inexact preconditioning • [Bouras, Frayss´ e, and Giraud, CERFACS reports 2000, SIMAX 2005] show experimentally that as k progresses � D k � can be allowed to be larger; see also [Golub and Ye, 1999], [Notay, 1999], [Sleijpen and van der Eshof, 2004] and [Simoncini and Eld´ en, 2002] 21
Inexact Krylov (cont.) We repeat: � D k � small at first, � D k � can be big later. Convergence is maintained! • Instead of HV m = V m +1 H m +1 ,m we have now [( H + D 1 ) v 1 , ( H + D 2 ) v 2 , . . . , ( H + D m ) v m ] = V m +1 H m +1 ,m • Subspace spanned by v 1 , v 2 , . . . , v m is not a Krylov subspace, but V m orthogonal (in the full case) 22
Theorem for Inexact FOM [Simoncini and Szyld, 2003] True residual: r m = b − Hx m = r 0 − HV m y m Computed residual(e.g.): ˜ r m = r 0 − V m +1 H m +1 ,m y m = r 0 − W m y m Let ε > 0 . If for every k ≤ m , � D k � ≤ σ min ( H m ∗ ) 1 1 r k − 1 � ε ≡ ℓ F r k − 1 � ε , m m ∗ � ˜ � ˜ then � V T m r m � ≤ ε and � r m − ˜ r m � ≤ ε . m ∗ being the maximum number of iterations allowed Similar results for inexact GMRES see also [Giraud, Gratton, Langou, 2007] 23
Theorem for Inexact Truncated FOM � D k � ≤ σ min ( H m ∗ ) σ min ( V m ) 1 1 r k − 1 � ε ≡ ℓ T F r k − 1 � ε , m m ∗ � ˜ � ˜ implies � V T m r m � ≤ ε and δ m = � r m − ˜ r m � ≤ ε . Notes: • This result applies in particular to Inexact CG • ℓ m can be estimated from problem, if information is available. 24
Recommend
More recommend