preconditioning of conjugate gradients in observation
play

Preconditioning of conjugate-gradients in observation space for - PowerPoint PPT Presentation

Preconditioning of conjugate-gradients in observation space for 4D-VAR S. Gratton 12 S. Gurol 2 Ph.L. Toint 3 J. Tshimanga 1 1 ENSEEIHT, Toulouse, France 2 CERFACS, Toulouse, France 3 FUNDP, Namur, Belgium Large scale problems and Applications in


  1. Preconditioning of conjugate-gradients in observation space for 4D-VAR S. Gratton 12 S. Gurol 2 Ph.L. Toint 3 J. Tshimanga 1 1 ENSEEIHT, Toulouse, France 2 CERFACS, Toulouse, France 3 FUNDP, Namur, Belgium Large scale problems and Applications in the Earth Sciences Gratton, Gurol, Toint,Tshimanga RICAM

  2. Outline Introduction 1 Algorithms for primal space 2 Working in the observation space 3 Krylov subspace methods Results on realistic systems Preconditioning Convergence Properties Conclusions 4 Gratton, Gurol, Toint,Tshimanga RICAM

  3. Introduction 4D-Var problem: Formulation → Large-scale non-linear weighted least-squares problem: N � x ∈ R n f ( x ) = 1 B − 1 + 1 2 || x − x b || 2 ||H j ( M j ( x )) − y j || 2 min R − 1 2 j j =0 where: x ∈ R n is the control variable The observations y j and the background x b are noisy M j are model operators H j are observation operators B is the covariance background error matrix R j are covariance observation error matrices Gratton, Gurol, Toint,Tshimanga RICAM

  4. Introduction 4D-Var problem: Formulation → Large-scale nonlinear weighted least-squares problem: N � x ∈ R n f ( x ) = 1 B − 1 + 1 2 || x − x b || 2 ||H j ( M j ( x )) − y j || 2 min R − 1 2 j j =0 Typically solved by a standard Gauss-Newton method known as Incremental 4D-Var in data assimilation community Solve linearized subproblem at iteration k (linesearch or trust-region 1 techniques require this kind or steps) 1 B − 1 + 1 2 � δx − [ x b − x ] � 2 2 � Hδx − d � 2 δx ∈ R n J ( δx ) min = R − 1 Sequence of quadratic minimization problems Perform update x ( k +1) ( t 0 ) = x ( k ) ( t 0 ) + δ x ( k ) 2 Gratton, Gurol, Toint,Tshimanga RICAM

  5. Introduction Quick note on globalization We want to converge to a point where the gradient is zero, from any starting point Technically, if the function is regular (gradient Lipschitz continuous, bounded from below), an ”approximate” gradient is all what is needed to have a globally convergent algorithm There are basically two techniques to obtain provable global convergence: linesearch, and trust region In linesearch, a steplength is controlled along a descent direction In trust regions, a model is minimized inside a sphere. In the case of the Taylor model truncated conjugate gradients (CG) do this In the rest of the talk, we focus on CG, and assume it is suitably truncated Gratton, Gurol, Toint,Tshimanga RICAM

  6. Introduction 4D-Var problem: Solution From optimality condition ( B − 1 + H T R − 1 H ) δ x = B − 1 ( x b − x ) + H T R − 1 d The aim is to approximately solve sequences of this linear system. Solution algorithms: conjugate gradient like method Exact solution writes: � � − 1 H T R − 1 ( d − H ( x b − x 0 )) B − 1 + H T R − 1 H x b − x 0 + Gratton, Gurol, Toint,Tshimanga RICAM

  7. Preconditioned Lanczos algorithm ( F 1 / 2 is not required!) Algorithms for primal space For i = 1 , 2 , ..., l w i = ( B − 1 + H T R − 1 H ) z i → Construction of the Krylov sequence 1 w i = w i − β i v i − 1 2 α i = � w i , z i � → Orthogonalization 3 w i +1 = w i − α i v i 4 z i +1 = Fw i +1 → Apply preconditioner 5 β i +1 = � z i +1 , w i +1 � 1 / 2 6 v i +1 = w i +1 /β i +1 → Normalization 7 z i +1 = z i +1 /β i +1 8 V = [ V, v i +1 ] → Orthonormal basis for Krylov subspace 9 10 T i,i = α i ; T i +1 ,i = T i,i +1 = β i +1 → Generate the tridiagonal matrix Solution y l = T − 1 → Impose the condition r k ⊥K l ( A, r 0 ) β 0 e 1 1 l δx l = FV l y l → Find the approximate solution 2 Gratton, Gurol, Toint,Tshimanga RICAM

  8. Preconditioned CG algorithm ( F 1 / 2 is not required!) Algorithms for primal space Initialization r 0 = Aδx 0 − b , z 0 = Fr 0 , p 0 = z 0 For i = 0 , 1 , ... q i = ( B − 1 + H T R − 1 H ) p i 1 α i = < r i , z i > / < q i , p i > → Compute the step-length 2 δx i +1 = δx i + α i p i → Update the iterate 3 r i +1 = r i − α i q i → Update the residual 4 r i +1 = r i +1 − RZ T r i +1 → Re-orthogonalization 5 z i +1 = Fr i +1 → Update the preconditioned residual 6 β i = < r i +1 , z i +1 > / < r i , z i > → Ensure A-conjugate directions 7 R = [ R, r/β i ] → Re-orthogonalization 8 Z = [ Z, z/β i ] → Re-orthogonalization 9 10 p i +1 = z i +1 + β i p i → Update the descent direction Gratton, Gurol, Toint,Tshimanga RICAM

  9. Algorithms for primal space Inner minimization Minimizing 1 B − 1 + 1 2 � δ x 0 − [ x b − x 0 ] � 2 2 � H δ x 0 − d � 2 J [ δ x 0 ] = R − 1 amounts to solve ( B − 1 + H T R − 1 H ) δ x 0 = B − 1 ( x b − x 0 ) + H T R − 1 d . Exact solution writes � − 1 H T R − 1 � � � x b − x 0 + B − 1 + H T R − 1 H d − H ( x b − x 0 ) , or equivalently (using the S-M-Woodbury formula) R + HBH T � − 1 � � x b − x 0 + BH T � d − H ( x b − x 0 ) . Gratton, Gurol, Toint,Tshimanga RICAM

  10. Algorithms for primal space Dual formulation : PSAS 1 Very popular when few observations compared to model variables. 2 Relies on R + HBH T � − 1 � � x b − x 0 + BH T � d − H ( x b − x 0 ) 3 Iteratively solve � T � v = R − 1 ( d − H ( x b − x 0 )) I + R − 1 HBH � for � v 4 Set δx 0 = x b − x 0 + BH T � v Gratton, Gurol, Toint,Tshimanga RICAM

  11. Algorithms for primal space Experiments Quadratic cost with primal and dual solvers size(B) = 1024 size(R) = 256 5 10 RPCG PSAS 4 10 Cost function 3 10 2 10 0 20 40 60 80 Iterations Gratton, Gurol, Toint,Tshimanga RICAM

  12. Algorithms for primal space Motivation : PSAS and CG-like algorithm 1 CG minimizes the Incremental 4D-Var function during its iterations. It minimizes a quadratic approximation of the non quadratic function : Gauss-Newton in the model space. 2 PSAS does not minimize the Incremental 4D-Var function during its iterations but works in the observation space. Our goal : put the advantages of both approaches together in a Trust-Region framework, to guarantee convergence: Keeping the variational property, to get the so-called Cauchy decrease even when iterations are truncated. Being computationally efficient whenever the number of observations is significantly smaller than the size of the state vector. Getting global convergence in the observation space ! Gratton, Gurol, Toint,Tshimanga RICAM

  13. Algorithms for primal space Dual CG algorithm : assumptions 1 Suppose the CG algorithm is applied to solve the Inc-4D using a preconditioning matrix F 2 Suppose there exists G m × m such that FH T = BH T G For ”exact” preconditioners � � − 1 H T = BH T � I + R − 1 HBH T � − 1 B − 1 + H T R − 1 H Gratton, Gurol, Toint,Tshimanga RICAM

  14. Working in the observation space Preconditioned CG on Incremental 4D-Var cost function Initialization steps Initialization steps Loop: WHILE Loop: WHILE 1 q i − 1 = 1 q i − 1 = Ap i − 1 ( H T R − 1 H + B − 1 ) p i − 1 2 α i − 1 = r T i − 1 z i − 1 / q T i − 1 p i − 1 2 α i − 1 = r T i − 1 z i − 1 / q T i − 1 p i − 1 3 v i = v i − 1 + α i − 1 p i − 1 3 v i = v i − 1 + α i − 1 p i − 1 4 r i = r i − 1 + α i − 1 q i − 1 4 r i = r i − 1 + α i − 1 q i − 1 5 z i = Fr i 5 z i = Fr i 6 β i = r T i z i / r T i − 1 z i − 1 6 β i = r T i z i / r T i − 1 z i − 1 7 p i = − z i + β i p i − 1 7 p i = − z i + β i p i − 1 Gratton, Gurol, Toint,Tshimanga RICAM

  15. Working in the observation space A useful observation Suppose that 1 BH T G = FH T . 2 v 0 = x b − x 0 . theres coordinate vectors � r i , p i , � � v i , � z i and � q i such that H T � r i = r i , BH T � = p i , p i v 0 + BH T � = v i , v i BH T � z i = z i , H T � = q i q i Gratton, Gurol, Toint,Tshimanga RICAM

  16. Working in the observation space Preconditioned CG on Incremental 4D-Var cost function (bis) Initialization steps given v 0 ; r 0 = ( H T R − 1 H + B − 1 ) v 0 − b , . . . Loop: WHILE q i − 1 = H T ( R − 1 HB − 1 H T + I m ) � 1 H T � p i − 1 2 α i − 1 = r T q T i − 1 z i − 1 / � i − 1 � p i − 1 3 BH T � v i = BH T ( � v i − 1 + α i − 1 � p i − 1 ) 4 H T � r i = H T ( � r i − 1 + α i − 1 � q i − 1 ) 5 BH T � r i = BH T G � z i = FH T � r i 6 β i = ( r T i z i / r T i − 1 z i − 1 ) 7 BH T � p i = BH T ( − � z i + β i � p i − 1 ) Gratton, Gurol, Toint,Tshimanga RICAM

  17. Working in the observation space Restricted PCG (version 1) : expensive Initialization steps given v 0 ; r 0 = ( H T R − 1 H + B − 1 ) v 0 − b , . . . Loop: WHILE q i − 1 = ( I m + R − 1 HB − 1 H T ) � � p i − 1 1 i − 1 HBH T � 2 α i − 1 = � r T i − 1 HBH T � q T z i − 1 / � p i − 1 � v i = � v i − 1 + α i − 1 � p i − 1 3 4 � r i = � r i − 1 + α i − 1 � q i − 1 5 � z i = FH T � r i = G � r i 6 β i = � r T i HBH T � r T i − 1 HBH T � z i / � z i − 1 � p i = − � z i + β i � p i − 1 7 Gratton, Gurol, Toint,Tshimanga RICAM

  18. Working in the observation space More transformations 1 Consider w and t defined by w i = HBH T � t i = HBH T � z i and p i 2 From Restricted PCG (version 1) � − w 0 if i = 0 , t i = − w i + β i t i − 1 if i > 0 , 3 Use these relations into Restricted PCG (version 1) 4 Transform Restricted PCG (version 1) into Restricted PCG (version 2) Gratton, Gurol, Toint,Tshimanga RICAM

Recommend


More recommend