Parallel Algorithms for Four-Dimensional Variational Data Assimilation Mike Fisher ECMWF October 24, 2011 Mike Fisher (ECMWF) Parallel 4D-Var October 24, 2011 1 / 37
Brief Introduction to 4D-Var Four-Dimensional Variational Data Assimilation 4D-Var is the method used by most national and international Numerical Weather Forecasting Centres to provide initial conditions for their forecast models. 4D-Var combines observations with a prior estimate of the state, provided by an earlier forecast. The method is described as Four-Dimensional because it takes into account observations that are distributed in space and over an interval of time (typically 6 or 12 hours), often called the analysis window. It does this by using a complex and computationally expensive numerical model to propagate information in time. In many applications of 4D-Var, the model is assumed to be perfect. In this talk, I will concentrate on so-called weak-constraint 4D-Var, which takes into account imperfections in the model. Mike Fisher (ECMWF) Parallel 4D-Var October 24, 2011 2 / 37
Brief Introduction to 4D-Var Weak-Constraint 4D-Var represents the data-assimilation problem as a very large least-squares problem. 1 2 ( x 0 − x b ) T B − 1 ( x 0 − x b ) J ( x 0 , x 1 , . . . , x N ) = N +1 ( y k − H k ( x k )) T R − 1 � ( y k − H k ( x k )) k 2 k =0 N +1 q ) T Q − 1 � ( q k − ¯ ( q k − ¯ q ) k 2 k =1 where q k = x k − M k ( x k − 1 ). Here, the cost function J is a function of the states x 0 , x 1 , . . . , x N defined at the start of each of a set of sub-windows that span the analysis window. Mike Fisher (ECMWF) Parallel 4D-Var October 24, 2011 3 / 37
Brief Introduction to 4D-Var 1 2 ( x 0 − x b ) T B − 1 ( x 0 − x b ) J ( x 0 , x 1 , . . . , x N ) = N +1 ( y k − H k ( x k )) T R − 1 � ( y k − H k ( x k )) k 2 k =0 N +1 q ) T Q − 1 � ( q k − ¯ ( q k − ¯ q ) k 2 k =1 Each x i contains ≈ 10 7 elements. Each y i contains ≈ 10 5 elements. The operators H k and M k , and the matrices B , R k and Q k are represented by codes that apply them to vectors. We do not have access to their elements. Mike Fisher (ECMWF) Parallel 4D-Var October 24, 2011 4 / 37
Brief Introduction to 4D-Var 1 2 ( x 0 − x b ) T B − 1 ( x 0 − x b ) J ( x 0 , x 1 , . . . , x N ) = N +1 ( y k − H k ( x k )) T R − 1 � ( y k − H k ( x k )) k 2 k =0 N +1 q ) T Q − 1 � ( q k − ¯ ( q k − ¯ q ) k 2 k =1 H k and M k involve integrations of the numerical model, and are computationally expensive. The covariance matrices B , R k and Q k are less expensive to apply. Mike Fisher (ECMWF) Parallel 4D-Var October 24, 2011 5 / 37
Brief Introduction to 4D-Var x q 3 q 1 x 1 q 2 x 3 x 2 x 0 time Mike Fisher (ECMWF) Parallel 4D-Var October 24, 2011 6 / 37
Brief Introduction to 4D-Var x q 3 q 1 x 1 q 2 x 3 x 2 x 0 time The cost function contains 3 terms: 2 ( x 0 − x b ) T B − 1 ( x 0 − x b ) ensures that x 0 stays close to the prior 1 estimate. k =0 ( y k − H k ( x k )) T R − 1 � N 1 ( y k − H k ( x k )) keeps the estimate close 2 k to the observations (blue circles). q ) T Q − 1 1 � N k =1 ( q k − ¯ ( q k − ¯ q ) keeps the jumps between 2 k sub-windows small. Mike Fisher (ECMWF) Parallel 4D-Var October 24, 2011 7 / 37
Gauss-Newton (Incremental) Algorithm It is usual to minimize the cost function using a modified Gauss-Newton algorithm. Nearly all the computational cost is in the inner loop, which minimizes the quadratic cost function: 1 δ x 0 − b ( n ) � T � B − 1 � δ x 0 − b ( n ) � ˆ J ( δ x ( n ) 0 , . . . , δ x ( n ) N ) = 2 N +1 � T � � � H ( n ) k δ x k − d ( n ) H ( n ) k δ x k − d ( n ) � R − 1 k k k 2 k =0 N +1 � T � � � δ q k − c ( n ) δ q k − c ( n ) � Q − 1 k k k 2 k =1 δ q k = δ x k − M ( n ) k δ x k − 1 , and where b ( n ) , c ( n ) and d ( n ) come from the outer loop: k k b ( n ) = x b − x ( n ) c ( n ) q − q ( n ) d ( n ) = y k − H k ( x ( n ) = ¯ k ) 0 , k , k k Mike Fisher (ECMWF) Parallel 4D-Var October 24, 2011 8 / 37
Gauss-Newton (Incremental) Algorithm 1 δ x 0 − b ( n ) � T � B − 1 � δ x 0 − b ( n ) � J ( δ x ( n ) 0 , . . . , δ x ( n ) ˆ N ) = 2 N +1 � T � H ( n ) k δ x k − d ( n ) � H ( n ) k δ x k − d ( n ) � � R − 1 k k k 2 k =0 N +1 � T � � � δ q k − c ( n ) δ q k − c ( n ) � Q − 1 k k k 2 k =1 Note that 4D-Var requires tangent linear versions of M k and H k : • M ( n ) and H ( n ) k , respectively k It also requires the transposes (adjoints) of these operators: • ( M ( n ) k ) T and ( H ( n ) k ) T , respectively Mike Fisher (ECMWF) Parallel 4D-Var October 24, 2011 9 / 37
Why do we need more parallel algorithms? In its usual implementation, 4D-Var is solved by applying a conjugate-gradient solver. This is highly sequential: • Iterations of CG. • Tangent Linear and Adjoint integrations run one after the other. • Model timesteps follow each other. Mike Fisher (ECMWF) Parallel 4D-Var October 24, 2011 10 / 37
Why do we need more parallel algorithms? In its usual implementation, 4D-Var is solved by applying a conjugate-gradient solver. This is highly sequential: • Iterations of CG. • Tangent Linear and Adjoint integrations run one after the other. • Model timesteps follow each other. Computers are becoming ever more parallel, but processors are not getting faster. Unless we do something to make 4D-Var more parallel, we will soon find that 4D-Var becomes un-affordable (even with a 12-hour window). Mike Fisher (ECMWF) Parallel 4D-Var October 24, 2011 10 / 37
Why do we need more parallel algorithms? In its usual implementation, 4D-Var is solved by applying a conjugate-gradient solver. This is highly sequential: • Iterations of CG. • Tangent Linear and Adjoint integrations run one after the other. • Model timesteps follow each other. Computers are becoming ever more parallel, but processors are not getting faster. Unless we do something to make 4D-Var more parallel, we will soon find that 4D-Var becomes un-affordable (even with a 12-hour window). We cannot make the model more parallel. • The inner loops of 4D-Var run with a few 10’s of grid columns per processor. • This is barely enough to mask inter-processor communication costs. Mike Fisher (ECMWF) Parallel 4D-Var October 24, 2011 10 / 37
Why do we need more parallel algorithms? In its usual implementation, 4D-Var is solved by applying a conjugate-gradient solver. This is highly sequential: • Iterations of CG. • Tangent Linear and Adjoint integrations run one after the other. • Model timesteps follow each other. Computers are becoming ever more parallel, but processors are not getting faster. Unless we do something to make 4D-Var more parallel, we will soon find that 4D-Var becomes un-affordable (even with a 12-hour window). We cannot make the model more parallel. • The inner loops of 4D-Var run with a few 10’s of grid columns per processor. • This is barely enough to mask inter-processor communication costs. We have to use more parallel algorithms. Mike Fisher (ECMWF) Parallel 4D-Var October 24, 2011 10 / 37
Parallelising within an Iteration The model is already parallel in both horizontal directions. The modellers tell us that it will be hard to parallelise in the vertical (and we already have too little work per processor). We are left with parallelising in the time direction. Mike Fisher (ECMWF) Parallel 4D-Var October 24, 2011 11 / 37
Weak Constraint 4D-Var: Inner Loop Dropping the outer loop index ( n ), the inner loop of weak-constraints 4D-Var minimises: 1 2 ( δ x 0 − b ) T B − 1 ( δ x 0 − b ) ˆ J ( δ x 0 , . . . , δ x N ) = N +1 ( H k δ x k − d k ) T R − 1 � ( H k δ x k − d k ) k 2 k =0 N +1 ( δ q k − c k ) T Q − 1 � ( δ q k − c k ) k 2 k =1 where δ q k = δ x k − M k δ x k − 1 , and where b , c k and d k come from the outer loop: b = x b − x 0 c k = ¯ q − q k d k = y k − H k ( x k ) Mike Fisher (ECMWF) Parallel 4D-Var October 24, 2011 12 / 37
Weak Constraint 4D-Var: Inner Loop We can simplify this further by defining some 4D vectors and matrices: δ x 0 δ x 0 δ x 1 δ q 1 δ x = δ p = . . . . . . δ x N δ q N These vectors are related through δ q k = δ x k − M k δ x k − 1 . We can write this relationship in matrix form as: δ p = L δ x where: I − M 1 I − M 2 I L = ... ... − M N I Mike Fisher (ECMWF) Parallel 4D-Var October 24, 2011 13 / 37
Weak Constraint 4D-Var: Inner Loop I − M 1 I − M 2 I L = ... ... − M N I δ p = L δ x can be done in parallel: δ q k = δ x k − M k δ x k − 1 . We know all the δ x k − 1 ′ s . We can apply all the M k ′ s simultaneously. δ x = L − 1 δ p is sequential: δ x k = M k δ x k − 1 + δ q k . We have to generate each δ x k − 1 in turn before we can apply the next M k . Mike Fisher (ECMWF) Parallel 4D-Var October 24, 2011 14 / 37
Brief Introduction to 4D-Var x q 3 q 1 x 1 q 2 x 3 x 2 x 0 time Mike Fisher (ECMWF) Parallel 4D-Var October 24, 2011 15 / 37
Recommend
More recommend