Preconditioning of conjugate-gradients in observation space for 4D-VAR S. Gratton 12 S. Gurol 2 Ph.L. Toint 3 J. Tshimanga 1 1 ENSEEIHT, Toulouse, France 2 CERFACS, Toulouse, France 3 FUNDP, Namur, Belgium Large scale problems and Applications in the Earth Sciences Gratton, Gurol, Toint,Tshimanga RICAM
Outline Introduction 1 Algorithms for primal space 2 Working in the observation space 3 Krylov subspace methods Results on realistic systems Preconditioning Convergence Properties Conclusions 4 Gratton, Gurol, Toint,Tshimanga RICAM
Introduction 4D-Var problem: Formulation → Large-scale non-linear weighted least-squares problem: N � x ∈ R n f ( x ) = 1 B − 1 + 1 2 || x − x b || 2 ||H j ( M j ( x )) − y j || 2 min R − 1 2 j j =0 where: x ∈ R n is the control variable The observations y j and the background x b are noisy M j are model operators H j are observation operators B is the covariance background error matrix R j are covariance observation error matrices Gratton, Gurol, Toint,Tshimanga RICAM
Introduction 4D-Var problem: Formulation → Large-scale nonlinear weighted least-squares problem: N � x ∈ R n f ( x ) = 1 B − 1 + 1 2 || x − x b || 2 ||H j ( M j ( x )) − y j || 2 min R − 1 2 j j =0 Typically solved by a standard Gauss-Newton method known as Incremental 4D-Var in data assimilation community Solve linearized subproblem at iteration k (linesearch or trust-region 1 techniques require this kind or steps) 1 B − 1 + 1 2 � δx − [ x b − x ] � 2 2 � Hδx − d � 2 δx ∈ R n J ( δx ) min = R − 1 Sequence of quadratic minimization problems Perform update x ( k +1) ( t 0 ) = x ( k ) ( t 0 ) + δ x ( k ) 2 Gratton, Gurol, Toint,Tshimanga RICAM
Introduction Quick note on globalization We want to converge to a point where the gradient is zero, from any starting point Technically, if the function is regular (gradient Lipschitz continuous, bounded from below), an ”approximate” gradient is all what is needed to have a globally convergent algorithm There are basically two techniques to obtain provable global convergence: linesearch, and trust region In linesearch, a steplength is controlled along a descent direction In trust regions, a model is minimized inside a sphere. In the case of the Taylor model truncated conjugate gradients (CG) do this In the rest of the talk, we focus on CG, and assume it is suitably truncated Gratton, Gurol, Toint,Tshimanga RICAM
Introduction 4D-Var problem: Solution From optimality condition ( B − 1 + H T R − 1 H ) δ x = B − 1 ( x b − x ) + H T R − 1 d The aim is to approximately solve sequences of this linear system. Solution algorithms: conjugate gradient like method Exact solution writes: � � − 1 H T R − 1 ( d − H ( x b − x 0 )) B − 1 + H T R − 1 H x b − x 0 + Gratton, Gurol, Toint,Tshimanga RICAM
Preconditioned Lanczos algorithm ( F 1 / 2 is not required!) Algorithms for primal space For i = 1 , 2 , ..., l w i = ( B − 1 + H T R − 1 H ) z i → Construction of the Krylov sequence 1 w i = w i − β i v i − 1 2 α i = � w i , z i � → Orthogonalization 3 w i +1 = w i − α i v i 4 z i +1 = Fw i +1 → Apply preconditioner 5 β i +1 = � z i +1 , w i +1 � 1 / 2 6 v i +1 = w i +1 /β i +1 → Normalization 7 z i +1 = z i +1 /β i +1 8 V = [ V, v i +1 ] → Orthonormal basis for Krylov subspace 9 10 T i,i = α i ; T i +1 ,i = T i,i +1 = β i +1 → Generate the tridiagonal matrix Solution y l = T − 1 → Impose the condition r k ⊥K l ( A, r 0 ) β 0 e 1 1 l δx l = FV l y l → Find the approximate solution 2 Gratton, Gurol, Toint,Tshimanga RICAM
Preconditioned CG algorithm ( F 1 / 2 is not required!) Algorithms for primal space Initialization r 0 = Aδx 0 − b , z 0 = Fr 0 , p 0 = z 0 For i = 0 , 1 , ... q i = ( B − 1 + H T R − 1 H ) p i 1 α i = < r i , z i > / < q i , p i > → Compute the step-length 2 δx i +1 = δx i + α i p i → Update the iterate 3 r i +1 = r i − α i q i → Update the residual 4 r i +1 = r i +1 − RZ T r i +1 → Re-orthogonalization 5 z i +1 = Fr i +1 → Update the preconditioned residual 6 β i = < r i +1 , z i +1 > / < r i , z i > → Ensure A-conjugate directions 7 R = [ R, r/β i ] → Re-orthogonalization 8 Z = [ Z, z/β i ] → Re-orthogonalization 9 10 p i +1 = z i +1 + β i p i → Update the descent direction Gratton, Gurol, Toint,Tshimanga RICAM
Algorithms for primal space Inner minimization Minimizing 1 B − 1 + 1 2 � δ x 0 − [ x b − x 0 ] � 2 2 � H δ x 0 − d � 2 J [ δ x 0 ] = R − 1 amounts to solve ( B − 1 + H T R − 1 H ) δ x 0 = B − 1 ( x b − x 0 ) + H T R − 1 d . Exact solution writes � − 1 H T R − 1 � � � x b − x 0 + B − 1 + H T R − 1 H d − H ( x b − x 0 ) , or equivalently (using the S-M-Woodbury formula) R + HBH T � − 1 � � x b − x 0 + BH T � d − H ( x b − x 0 ) . Gratton, Gurol, Toint,Tshimanga RICAM
Algorithms for primal space Dual formulation : PSAS 1 Very popular when few observations compared to model variables. 2 Relies on R + HBH T � − 1 � � x b − x 0 + BH T � d − H ( x b − x 0 ) 3 Iteratively solve � T � v = R − 1 ( d − H ( x b − x 0 )) I + R − 1 HBH � for � v 4 Set δx 0 = x b − x 0 + BH T � v Gratton, Gurol, Toint,Tshimanga RICAM
Algorithms for primal space Experiments Quadratic cost with primal and dual solvers size(B) = 1024 size(R) = 256 5 10 RPCG PSAS 4 10 Cost function 3 10 2 10 0 20 40 60 80 Iterations Gratton, Gurol, Toint,Tshimanga RICAM
Algorithms for primal space Motivation : PSAS and CG-like algorithm 1 CG minimizes the Incremental 4D-Var function during its iterations. It minimizes a quadratic approximation of the non quadratic function : Gauss-Newton in the model space. 2 PSAS does not minimize the Incremental 4D-Var function during its iterations but works in the observation space. Our goal : put the advantages of both approaches together in a Trust-Region framework, to guarantee convergence: Keeping the variational property, to get the so-called Cauchy decrease even when iterations are truncated. Being computationally efficient whenever the number of observations is significantly smaller than the size of the state vector. Getting global convergence in the observation space ! Gratton, Gurol, Toint,Tshimanga RICAM
Algorithms for primal space Dual CG algorithm : assumptions 1 Suppose the CG algorithm is applied to solve the Inc-4D using a preconditioning matrix F 2 Suppose there exists G m × m such that FH T = BH T G For ”exact” preconditioners � � − 1 H T = BH T � I + R − 1 HBH T � − 1 B − 1 + H T R − 1 H Gratton, Gurol, Toint,Tshimanga RICAM
Working in the observation space Preconditioned CG on Incremental 4D-Var cost function Initialization steps Initialization steps Loop: WHILE Loop: WHILE 1 q i − 1 = 1 q i − 1 = Ap i − 1 ( H T R − 1 H + B − 1 ) p i − 1 2 α i − 1 = r T i − 1 z i − 1 / q T i − 1 p i − 1 2 α i − 1 = r T i − 1 z i − 1 / q T i − 1 p i − 1 3 v i = v i − 1 + α i − 1 p i − 1 3 v i = v i − 1 + α i − 1 p i − 1 4 r i = r i − 1 + α i − 1 q i − 1 4 r i = r i − 1 + α i − 1 q i − 1 5 z i = Fr i 5 z i = Fr i 6 β i = r T i z i / r T i − 1 z i − 1 6 β i = r T i z i / r T i − 1 z i − 1 7 p i = − z i + β i p i − 1 7 p i = − z i + β i p i − 1 Gratton, Gurol, Toint,Tshimanga RICAM
Working in the observation space A useful observation Suppose that 1 BH T G = FH T . 2 v 0 = x b − x 0 . theres coordinate vectors � r i , p i , � � v i , � z i and � q i such that H T � r i = r i , BH T � = p i , p i v 0 + BH T � = v i , v i BH T � z i = z i , H T � = q i q i Gratton, Gurol, Toint,Tshimanga RICAM
Working in the observation space Preconditioned CG on Incremental 4D-Var cost function (bis) Initialization steps given v 0 ; r 0 = ( H T R − 1 H + B − 1 ) v 0 − b , . . . Loop: WHILE q i − 1 = H T ( R − 1 HB − 1 H T + I m ) � 1 H T � p i − 1 2 α i − 1 = r T q T i − 1 z i − 1 / � i − 1 � p i − 1 3 BH T � v i = BH T ( � v i − 1 + α i − 1 � p i − 1 ) 4 H T � r i = H T ( � r i − 1 + α i − 1 � q i − 1 ) 5 BH T � r i = BH T G � z i = FH T � r i 6 β i = ( r T i z i / r T i − 1 z i − 1 ) 7 BH T � p i = BH T ( − � z i + β i � p i − 1 ) Gratton, Gurol, Toint,Tshimanga RICAM
Working in the observation space Restricted PCG (version 1) : expensive Initialization steps given v 0 ; r 0 = ( H T R − 1 H + B − 1 ) v 0 − b , . . . Loop: WHILE q i − 1 = ( I m + R − 1 HB − 1 H T ) � � p i − 1 1 i − 1 HBH T � 2 α i − 1 = � r T i − 1 HBH T � q T z i − 1 / � p i − 1 � v i = � v i − 1 + α i − 1 � p i − 1 3 4 � r i = � r i − 1 + α i − 1 � q i − 1 5 � z i = FH T � r i = G � r i 6 β i = � r T i HBH T � r T i − 1 HBH T � z i / � z i − 1 � p i = − � z i + β i � p i − 1 7 Gratton, Gurol, Toint,Tshimanga RICAM
Working in the observation space More transformations 1 Consider w and t defined by w i = HBH T � t i = HBH T � z i and p i 2 From Restricted PCG (version 1) � − w 0 if i = 0 , t i = − w i + β i t i − 1 if i > 0 , 3 Use these relations into Restricted PCG (version 1) 4 Transform Restricted PCG (version 1) into Restricted PCG (version 2) Gratton, Gurol, Toint,Tshimanga RICAM
Recommend
More recommend