A new algorithm for computing quadrature-based bounds in conjugate gradients Petr Tichý Czech Academy of Sciences Charles University in Prague joint work with Gérard Meurant and Zdeněk Strakoš June 08–13, 2014 Householder Symposium XIX, Spa, Belgium 1
Problem formulation Consider a system A x = b where A ∈ R n × n is symmetric, positive definite . Without loss of generality, � b � = 1 , x 0 = 0 . 2
The conjugate gradient method D k input A , b γ − 1 r 0 = b , p 0 = r 0 0 ... for k = 1 , 2 , . . . do ... r T k − 1 r k − 1 γ k − 1 = p T γ − 1 k − 1 A p k − 1 k − 1 x k = x k − 1 + γ k − 1 p k − 1 r k = r k − 1 − γ k − 1 A p k − 1 L k r T k r k δ k = 1 r T k − 1 r k − 1 √ δ 1 ... p k = r k + δ k p k − 1 ... ... � δ k − 1 test quality of x k 1 end for 3
How to measure quality of an approximation in CG? A practically relevant question using residual information, – normwise backward error, – relative residual norm. “Using of the residual vector r k as a measure of the “goodness” of the estimate x k is not reliable” [Hestenes & Stiefel 1952] using error estimates, – the A -norm of the error , – the Euclidean norm of the error. “The function ( x − x k , A ( x − x k )) can be used as a measure of the “goodness” of x k as an estimate of x .” [Hestenes & Stiefel 1952] The A -norm of the error plays an important role in stopping criteria [Deuflhard 1994] , [Arioli 2004] , [Jiránek, Strakoš, Vohralík 2006] . 4
The Lanczos algorithm Let A be symmetric, compute orthonormal basis of K k ( A , b ) input A , b v 1 = b/ � b || , δ 1 = 0 β 0 = 0 , v 0 = 0 T k for k = 1 , 2 , . . . do α 1 β 1 v T α k = k A v k ... β 1 w = A v k − α k v k − β k − 1 v k − 1 ... β k − 1 β k = � w � β k − 1 α k v k +1 = w/β k end for 5
CG versus Lanczos Let A be symmetric, positive definite Lanczos CG ← → D k , L k T k Both algorithms generate an orthogonal basis of K k ( A , b ) . Lanczos using a three-term recurrence → T k . CG using a coupled two-term recurrence → D k , L k . T k = L k D k L T k . 6
CG, Lanczos and Gauss quadrature CG, Lanczos Orthogonal pol. Gauss quadrature Jacobi matrices At any iteration step k , CG (implicitly) determines weights and nodes of the k -point Gauss quadrature � ξ k � f ( λ ) dω ( λ ) = ω i f ( θ i ) + R k [ f ] . ζ i =1 7
Gauss quadrature for f ( λ ) ≡ λ − 1 Gauss quadrature � ξ k � ω i λ − 1 dω ( λ ) = + R k [ λ − 1 ] . θ i ζ i =1 Lanczos � � � � T − 1 T − 1 1 , 1 + R k [ λ − 1 ] . 1 , 1 = n k CG k − 1 � � x � 2 γ j � r j � 2 + � x − x k � 2 A = A . j =0 � �� � τ k R k [ λ − 1 ] > 0 . Important : 8
Gauss-Radau quadrature for f ( λ ) = λ − 1 µ is prescribed � ξ k � ˜ � � f ( λ ) dω ( λ ) = ω i f � θ i + � ω k +1 f ( µ ) + R k [ f ] , ζ i =1 � �� � � � T − 1 � 1 , 1 ≡ � τ k +1 k +1 where α 1 β 1 ... ... β 1 � ... ... T k +1 = β k − 1 β k − 1 α k β k β k α k +1 � and µ is an eigenvalue of � T k +1 . Important : if 0 < µ ≤ λ min , then R k [ λ − 1 ] < 0 . 9
Idea of estimating the A -norm of the error [Golub & Strakoš 1994], [Golub & Meurant 1994, 1997] Consider two quadrature rules at steps k and k + d , d > 0 , � x � 2 + � x − x k � 2 = τ k A , A � x � 2 � = τ k + d + R k + d . � A Then τ k + d − τ k + ˆ � x − x k � 2 A = � R k + d . ˆ Gauss quadrature: R k + d > 0 → lower bound , Radau quadrature: ˆ R k + d < 0 → upper bound . How to compute efficiently τ k + d − τ k ? � 10
How to compute efficiently � τ k + d − τ k ? � x − x k � 2 τ k + d − τ k + ˆ A = � R k + d . For numerical reasons , it is not convenient to compute τ k and � τ k + d explicitly. Instead, k + d − 2 � τ k + d − τ k = ( τ j +1 − τ j ) + ( � τ j + d − τ j + d − 1 ) � j = k k + d − 2 � ∆ j + � ≡ ∆ k + d − 1 , j = k � � T − 1 and update the ∆ j ’s without subtractions . Recall τ j = 1 , 1 . j 11
Golub and Meurant approach [Golub & Meurant 1994, 1997] : Use tridiagonal matrices � → → T k − µ I → CG T k T k and compute ∆ ’s using updating strategies, no need to store tridiagonal matrices. Use the formulas k + d − 1 � � x − x k � 2 ∆ j + � x − x k + d � 2 = A , A j = k k + d − 2 � ∆ j + ∆ ( µ ) k + d − 1 + R ( R ) � x − x k � 2 = k + d . A j = k 12
CGQL (Conjugate Gradients and Quadrature via Lanczos) input A , b , x 0 , µ r 0 = b − A x 0 , p 0 = r 0 α ( µ ) δ 0 = 0 , γ − 1 = 1 , c 1 = 1 , β 0 = 0 , d 0 = 1 , ˜ = µ , 1 for k = 1 , . . . , until convergence do CG-iteration ( k ) 1 + δ k − 1 δ k , β 2 α k = k = γ 2 γ k − 1 γ k − 2 k − 1 α k − β 2 , ∆ k − 1 = � r 0 � 2 c 2 k − 1 k d k = , d k − 1 d k β 2 α ( µ ) k ˜ = µ + , k +1 α ( µ ) α k − ˜ k β 2 k c 2 k +1 = β 2 k c 2 ∆ ( µ ) � r 0 � 2 c 2 k k � � , = k α ( µ ) d 2 k +1 d k − β 2 d k ˜ k k Estimates( k , d ) end for 13
Our approach [Meurant & T. 2013] : Update LDL T decompositions of T k and � T k L k � � D k � L k D k L T L T → → CG k k We use tridiagonal matrices only implicitly . We get very simple formulas for updating ∆ k − 1 and ∆ ( µ ) k . In [Meurant & T. 2013] , this idea is used also for other types of quadratures (Gauss-Lobatto, Anti-Gauss). 14
CGQ (Conjugate Gradients and Quadrature) [Meurant & T. 2013] input A , b , x 0 , µ , r 0 = b − A x 0 , p 0 = r 0 = � r 0 � 2 ∆ ( µ ) , 0 µ for k = 1 , . . . , until convergence do CG-iteration( k ) γ k − 1 � r k − 1 � 2 , ∆ k − 1 = � r k � 2 � � ∆ ( µ ) k − 1 − ∆ k − 1 ∆ ( µ ) � � = k ∆ ( µ ) + � r k � 2 µ k − 1 − ∆ k − 1 Estimates( k , d ) end for 15
Conclusions (theoretical part) Simple formulas for computing bounds on � x − x k � A . Almost for free . Work well also with preconditioning . Behaviour in finite precision arithmetic ? 16
CG in finite precision arithmetic Orthogonality is lost , convergence is delayed ! (E) || x−x j || A 0 10 (FP) || x−x j || A (FP) || I−V T j V j || F −2 10 −4 10 −6 10 −8 10 −10 10 −12 10 −14 10 −16 10 0 20 40 60 80 100 120 Identities need not hold in finite precision arithmetic! 17
Bounds in finite precision arithmetic Observation : CGQL and CGQ give the same results (up to a small inaccuracy). Do the bounds correspond to � x − x k � A ? Gauss quadrature lower bound → yes [Strakoš & T. 2002] . What about the Gauss-Radau upper bound? ∆ ( µ ) + R ( R ) � x − x k � 2 = k +1 , A k � ∆ ( µ ) � x − x k � A ≤ k . 18
Gauss-Radau upper bound, exact arithmetic Strakoš matrix, n = 48 , λ 1 = 0 . 1 , λ n = 1000 , ρ = 0 . 9 , d = 1 CGQ − strakos48 1 10 0 10 −1 10 −2 10 || x − x k || A Upper bound, µ = 0.5 λ 1 Upper bound, µ = λ 1 −3 10 0 5 10 15 20 25 30 35 40 45 50 19
Gauss-Radau upper bound, finite precision arithmetic Strakoš matrix, n = 48 , λ 1 = 0 . 1 , λ n = 1000 , ρ = 0 . 9 , d = 1 CGQ − strakos48 2 10 0 10 −2 10 −4 10 −6 10 −8 10 −10 10 −12 10 || x − x k || A −14 10 Upper bound, µ = 0.5 λ 1 Upper bound, µ = λ 1 −16 10 0 20 40 60 80 100 120 20
Gauss-Radau upper bound, finite precision arithmetic Strakoš matrix, n = 48 , λ 1 = 0 . 1 , λ n = 1000 , ρ = 0 . 9 , d = 1 µ = α λ 1 , α ∈ (0 , 1] , α ≈ 1 (red), α ≈ 0 (magenta) CGQ − strakos48 −2 10 −4 10 −6 10 � −8 ∆ ( µ ) 10 k −10 10 −12 10 −14 10 −16 10 80 85 90 95 100 105 110 115 120 21
Gauss-Radau upper bound, finite precision arithmetic Strakoš matrix, n = 48 , λ 1 = 0 . 1 , λ n = 1000 , ρ = 0 . 9 , d = 1 µ = α λ 1 , α ∈ (0 , 1] , α ≈ 1 (red), α ≈ 0 (magenta) CGQ − strakos48 −2 10 −4 10 −6 10 � −8 α ∆ ( µ ) 10 k −10 10 −12 10 −14 10 −16 10 80 85 90 95 100 105 110 115 120 22
Conclusions (numerical observation) Gauss-Radau upper bound It seems that √ ε is a limiting level for the accuracy of the Gauss-Radau upper bound. We cannot avoid subtractions in computing this bound. If µ ≈ λ 1 , then T k − µ I may be ill conditioned . Simple formulas → investigation of numerical behaviour. Understanding can help in suggesting another approach , in improving Gauss quadrature lower bound (adaptive choice of d ). 23
Recommend
More recommend