on computing quadrature based bounds for the a norm of
play

On computing quadrature-based bounds for the A -norm of the error in - PowerPoint PPT Presentation

On computing quadrature-based bounds for the A -norm of the error in conjugate gradients Petr Tich joint work with Gerard Meurant and Zdenk Strako Institute of Computer Science, Academy of Sciences of the Czech Republic September 13,


  1. On computing quadrature-based bounds for the A -norm of the error in conjugate gradients Petr Tichý joint work with Gerard Meurant and Zdeněk Strakoš Institute of Computer Science, Academy of Sciences of the Czech Republic September 13, 2012, Podbanské, Slovakia ALGORITMY 2012 1

  2. Problem formulation Consider a system A x = b where A ∈ R n × n is symmetric, positive definite . Without loss of generality, � b � = 1 , x 0 = 0 . 2

  3. The conjugate gradient method input A , b r 0 = b , p 0 = r 0 for k = 1 , 2 , . . . do r T k − 1 r k − 1 γ k − 1 = p T k − 1 A p k − 1 x k = x k − 1 + γ k − 1 p k − 1 r k = r k − 1 − γ k − 1 A p k − 1 r T k r k δ k = r T k − 1 r k − 1 p k = r k + δ k p k − 1 test quality of x k end for 3

  4. Mathematical properties of CG optimality property CG → x k , r k , p k The k th Krylov subspace, K k ( A , b ) ≡ span { b, A b, . . . , A k − 1 b } . Residuals r 0 , . . . , r k − 1 form an orthogonal basis of K k ( A , b ) . The CG approximation x k is optimal � x − x k � A = min y ∈K k � x − y � A . 4

  5. A practically relevant question How to measure quality of an approximation? using residual information, – normwise backward error, – relative residual norm. “Using of the residual vector r k as a measure of the “goodness” of the estimate x k is not reliable” [Hestenes & Stiefel 1952] using error estimates, – estimate of the A -norm of the error, – estimate of the Euclidean norm of the error. “The function ( x − x k , A ( x − x k )) can be used as a measure of the “goodness” of x k as an estimate of x .” [Hestenes & Stiefel 1952] The (relative) A -norm of the error plays an important role in stopping criteria in many problems [Deuflhard 1994] , [Arioli 2004] , [Jiránek, Strakoš, Vohralík 2006] 5

  6. The Lanczos algorithm Let A be symmetric, compute orthonormal basis of K k ( A , b ) input A , b v 1 = b/ � b || , δ 1 = 0 T k β 0 = 0 , v 0 = 0   α 1 β 1 for k = 1 , 2 , . . . do   ... α k = v T k A v k   β 1     w = A v k − α k v k − β k − 1 v k − 1 ...   β k − 1   β k = � w � β k − 1 α k v k +1 = w/β k end for The Lanczos algorithm can be represented by V k T k + β k v k +1 e T V ∗ = k , k V k = I . AV k 7

  7. CG versus Lanczos Let A be symmetric, positive definite The CG approximation is the given by x k = V k y k where T k y k = � b � e 1 . It holds that v k +1 = ( − 1) k r k T k = L k D k L T � r k � , k , where     γ − 1 1 0     √ δ 1 ... ...         L k ≡ , D k ≡ .     ... ... ...         � δ k − 1 γ − 1 1 k − 1 8

  8. CG versus Lanczos Summary Both algorithms generate an orthogonal basis of the Krylov subspace K k ( A , b ) . Lanczos generates an orthonormal basis v 1 , . . . , v k using a three-term recurrence → T k . CG generates an orthogonal basis r 0 , . . . , r k − 1 using a coupled two-term recurrence → T k = L k D k L T k . It holds that v k +1 = ( − 1) k r k � r k � . 9

  9. CG, Lanczos and Gauss quadrature Overview CG, Lanczos, algorithms Gauss Quadrature Orthogonal polynomials nodes, weights Jacobi matrices 11

  10. CG, Lanczos and Gauss quadrature Corresponding formulas At any iteration step k , CG (implicitly) determines weights and nodes of the k -point Gauss quadrature � ξ k � ω ( k ) f ( θ ( k ) f ( λ ) dω ( λ ) = ) + R k [ f ] . i i ζ i =1 T k . . . Jacobi matrix, θ ( k ) . . . eigenvalues of T k , ω ( k ) . . . scaled and i i squared first components of the normalized eigenvectors of T k . f ( λ ) ≡ λ − 1 . Lanczos-related quantities: � � � � T − 1 T − 1 1 , 1 + R k [ λ − 1 ] . 1 , 1 = n k CG-related quantities k − 1 � γ j � r j � 2 + � x − x k � 2 � x � 2 A = A . j =0 12

  11. Gauss-Radau quadrature More general quadrature formulas � ξ k m � � f dω ( λ ) = w i f ( ν i ) + w j f ( � ν j ) + R k [ f ] , � ζ i =1 j =1 the weights [ w i ] k w j ] m j =1 and the nodes [ ν i ] k i =1 , [ � i =1 are unknowns, ν j ] m [ � j =1 are prescribed outside the open integration interval. m = 1 : Gauss-Radau quadrature. Algebraically: Given µ ≡ � ν 1 , find � α k +1 so that µ is an eigenvalue of the extended matrix   α 1 β 1   ... ...   β 1     � ... ... T k +1 = .   β k − 1     β k − 1 α k β k   β k α k +1 � � � Quadrature for f ( λ ) = λ − 1 is given by T − 1 � 1 , 1 . k +1 13

  12. Quadrature formulas Golub - Meurant - Strakoš approach Quadrature formulas for f ( λ ) = λ − 1 take the form � � � � 1 , 1 + R ( G ) T − 1 T − 1 = , n k k 1 , 1 � � � � 1 , 1 + R ( R ) T − 1 T − 1 � = , n k k 1 , 1 and R ( G ) > 0 while R ( R ) < 0 if µ ≤ λ min . Equivalently k k � x � 2 τ k + � x − x k � 2 = A , A τ k + R ( R ) � x � 2 = � . A k � � � � T − 1 T − 1 � where τ k ≡ 1 , 1 , � τ k ≡ 1 , 1 . k k [Golub & Meurant 1994, 1997, 2010, Golub & Strakoš 1994] 14

  13. Idea of estimating the A -norm of the error Consider two quadrature rules at steps k and k + d , d > 0 , � x � 2 + � x − x k � 2 = τ k A , A � x � 2 � = � τ k + d + R k + d . (1) A Then τ k + d − τ k + ˆ � x − x k � 2 A = � R k + d . R k + d = R ( G ) ˆ Gauss quadrature: k + d > 0 → lower bound, R k + d = R ( R ) Radau quadrature: ˆ k + d < 0 → upper bound. How to compute efficiently τ k + d − τ k ? � 15

  14. How to compute � τ k + d − τ k ? For numerical reasons, it is not good to compute explicitly τ k , � τ k + d , and subtract . Instead, we use the formula, k + d − 2 � � τ k + d − τ k = ( τ j +1 − τ j ) + ( � τ j + d − τ j + d − 1 ) j = k k + d − 2 � ∆ j + � ≡ ∆ k + d − 1 , j = k and update the ∆ ’s without subtraction. Recall that � � � � T − 1 T − 1 ∆ j = 1 , 1 − 1 , 1 , j +1 j � � � � � T − 1 � T − 1 ∆ k + d − 1 = 1 , 1 − 1 , 1 . k + d k + d − 1 17

  15. Golub and Meurant approach [Golub & Meurant 1994, 1997] : Use tridiagonal matrices, � → T k → T k − µ I → CG T k Compute the ∆ ’s, � � � � T − 1 T − 1 ∆ k − 1 ≡ 1 , 1 − 1 , 1 , k k − 1 � � � � ∆ ( µ ) T − 1 � T − 1 ≡ 1 , 1 − 1 , 1 . k k +1 k Use the formulas k + d − 1 � � x − x k � 2 ∆ j + � x − x k + d � 2 = A , A j = k k + d − 2 � ∆ j + ∆ ( µ ) k + d − 1 + R ( R ) � x − x k � 2 = k + d . A j = k 18

  16. CGQL (Conjugate Gradients and Quadrature via Lanczos) input A , b , x 0 , µ r 0 = b − A x 0 , p 0 = r 0 α ( µ ) δ 0 = 0 , γ − 1 = 1 , c 1 = 1 , β 0 = 0 , d 0 = 1 , ˜ = µ , 1 for k = 1 , . . . , until convergence do CG-iteration ( k ) 1 + δ k − 1 δ k , β 2 α k = k = γ 2 γ k − 1 γ k − 2 k − 1 α k − β 2 , ∆ k − 1 = � r 0 � 2 c 2 k − 1 k d k = , d k − 1 d k β 2 α ( µ ) k ˜ = µ + , k +1 α ( µ ) α k − ˜ k β 2 k c 2 k +1 = β 2 k c 2 ∆ ( µ ) � r 0 � 2 c 2 k k = � � , k α ( µ ) d 2 k +1 d k − β 2 d k ˜ k k Estimates( k , d ) end for 19

  17. Our approach [Meurant & T. 2012] We use tridiagonal matrices only implicitly. CG generates LDL T factorization of T k . Update LDL T factorizations of the tridiagonal matrices � T k Quite complicated algebraic manipulations, but, in the end, we get very simple formulas for updating ∆ k − 1 and ∆ ( µ ) k . This idea can be used also for other types of quadratures (Gauss-Lobatto, Anti-Gauss). 20

  18. CGQ (Conjugate Gradients and Quadrature) [Meurant & T. 2012] input A , b , x 0 , µ , r 0 = b − A x 0 , p 0 = r 0 ∆ ( µ ) = � r 0 � 2 , 0 µ for k = 1 , . . . , until convergence do CG-iteration( k ) γ k − 1 � r k − 1 � 2 , ∆ k − 1 = � r k � 2 � � ∆ ( µ ) k − 1 − ∆ k − 1 ∆ ( µ ) = � � k ∆ ( µ ) + � r k � 2 µ k − 1 − ∆ k − 1 Estimates( k , d ) end for 21

  19. Practically relevant questions The estimation is based on formulas k + d − 1 � � x − x k � 2 ∆ j + � x − x k + d � 2 = A A j = k k + d − 2 � ∆ j + ∆ ( µ ) k + d − 1 + R ( R ) � x − x k � 2 = A k + d j = k We are able to compute ∆ j and ∆ ( µ ) almost for free. j Practically relevant questions : What happens in finite precision arithmetic ? How to choose d ? How to choose µ ? 23

  20. Finite precision arithmetic CG behavior Orthogonality is lost, convergence is delayed! (E) || x−x j || A 0 10 (FP) || x−x j || A (FP) || I−V T j V j || F −2 10 −4 10 −6 10 −8 10 −10 10 −12 10 −14 10 −16 10 0 20 40 60 80 100 120 Identities need not hold in finite precision arithmetic! 24

  21. Rounding error analysis Lower bound [Strakoš & T. 2002, 2005] : The equality k + d − 1 � � x − x k � 2 ∆ j + � x − x k + d � 2 = A A j = k holds (up to a small inaccuracy) also in finite precision arithmetic for computed vectors and coefficients. Upper bound: There is no rounding error analysis of k + d − 2 � ∆ j + ∆ ( µ ) k + d − 1 + R ( R ) � x − x k � 2 A = k + d . j = k 25

Recommend


More recommend