Improving EnKF with machine learning algorithms John Harlim Department of Mathematics and Department of Meteorology The Pennsylvania State University June 12, 2017
Overview A supervised learning algorithm An unsupervised learning algorithm (diffusion maps) Learning the localization function of EnKF Learning a likelihood function. Application: To Correct biased observation model error in DA
A supervised learning algorithm The basic idea of supervised learning algorithm is to train a map H : X → Y , from a pair of data set { x i , y i } i =1 ,..., N .
A supervised learning algorithm The basic idea of supervised learning algorithm is to train a map H : X → Y , from a pair of data set { x i , y i } i =1 ,..., N . Remarks: ◮ The objective is to use the estimated map ˆ H to predict y s = ˆ H ( x s ) given new data x s .
A supervised learning algorithm The basic idea of supervised learning algorithm is to train a map H : X → Y , from a pair of data set { x i , y i } i =1 ,..., N . Remarks: ◮ The objective is to use the estimated map ˆ H to predict y s = ˆ H ( x s ) given new data x s . ◮ Various methods to estimate H include regression, SVM, KNN, Neural Nets, etc.
A supervised learning algorithm The basic idea of supervised learning algorithm is to train a map H : X → Y , from a pair of data set { x i , y i } i =1 ,..., N . Remarks: ◮ The objective is to use the estimated map ˆ H to predict y s = ˆ H ( x s ) given new data x s . ◮ Various methods to estimate H include regression, SVM, KNN, Neural Nets, etc. ◮ For this talk, we will focus on how to use regression in appropriate spaces to improve EnKF.
An unsupervised learning algorithm Given a data set { x i } , the main task is to learn a function ϕ ( x i ) that can describe the data. 1 Coifman & Lafon 2006, Berry & H, 2016.
An unsupervised learning algorithm Given a data set { x i } , the main task is to learn a function ϕ ( x i ) that can describe the data. In this talk, I will focus on a nonlinear manifold learning algorithm, the diffusion maps 1 : Given { x i } ∈ M ⊂ R n with a sampling measure q , the diffusion maps algorithm is a kernel based method that produces orthonormal basis functions on the manifold, ϕ k ∈ L 2 ( M , q ). 1 Coifman & Lafon 2006, Berry & H, 2016.
An unsupervised learning algorithm Given a data set { x i } , the main task is to learn a function ϕ ( x i ) that can describe the data. In this talk, I will focus on a nonlinear manifold learning algorithm, the diffusion maps 1 : Given { x i } ∈ M ⊂ R n with a sampling measure q , the diffusion maps algorithm is a kernel based method that produces orthonormal basis functions on the manifold, ϕ k ∈ L 2 ( M , q ). These basis functions are solutions of an eigenvalue problem, � � q − 1 div q ∇ ϕ k ( x ) = λ k ϕ k ( x ) , where the weighted Laplacian operator is approximated with an integral operator with appropriate normalization. 1 Coifman & Lafon 2006, Berry & H, 2016.
Examples: Example: Uniformly distributed data on a circle, we obtain the Fourier basis. e ix e i2x e i3x 1 1 1 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0 0 0 -0.2 -0.2 -0.2 -0.4 -0.4 -0.4 -0.6 -0.6 -0.6 -0.8 -0.8 -0.8 -1 -1 -1 0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6 Example: Gaussian distributed data on a real line, we obtain the Hermite polynomials. 600 30 300 estimate true 400 20 200 200 10 100 ϕ 3 ( x ) ϕ 1 ( x ) ϕ 2 ( x ) 0 0 0 -200 -10 -100 -400 -20 -200 -600 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 -30 -300 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 x
Example: Nonparametric basis functions estimated on nontrivial manifold 45 45 40 40 35 35 30 30 25 25 20 20 15 15 10 10 5 5 −20 −20 10 10 −10 −10 0 0 0 0 10 10 −10 −10 20 20 45 45 40 40 35 35 30 30 25 25 20 20 15 15 10 10 5 5 −20 −20 10 10 −10 −10 0 0 0 0 10 10 −10 −10 20 20 Remark: Essentially, one can view the DM as a method to learn generalized Fourier basis on the manifold.
Learning the localization function of EnKF ◮ When EnKF is performed with small ensemble size, one way to alleviate the spurious correlation is to employ a localization function.
Learning the localization function of EnKF ◮ When EnKF is performed with small ensemble size, one way to alleviate the spurious correlation is to employ a localization function. ◮ For example, in the serial EnKF, for each scalar observation, y i , one “localizes” the Kalman gain, K = L xy i ◦ XY ⊤ i ( Y i Y ⊤ + R ) − 1 , i with an empirically chosen localization function L xy i (Gaspari-Cohn, etc), which requires some tunings.
Learning the localization function of EnKF ◮ When EnKF is performed with small ensemble size, one way to alleviate the spurious correlation is to employ a localization function. ◮ For example, in the serial EnKF, for each scalar observation, y i , one “localizes” the Kalman gain, K = L xy i ◦ XY ⊤ i ( Y i Y ⊤ + R ) − 1 , i with an empirically chosen localization function L xy i (Gaspari-Cohn, etc), which requires some tunings. ◮ Let’s use the idea from machine learning to train this localization function. The key idea is to find a map that takes poorly estimated correlations to accurately estimated correlations.
Learning localization map 2 Given a set of large ensemble EnKF solutions, { x a , k m } as a k =1 ,..., L m =1 ,..., M training data set, where L is large enough so the correlation, ρ L ij ≈ ρ ( x i , y j ), is accurate. 2 De La Chevroti` ere & H, 2017.
Learning localization map 2 Given a set of large ensemble EnKF solutions, { x a , k m } as a k =1 ,..., L m =1 ,..., M training data set, where L is large enough so the correlation, ρ L ij ≈ ρ ( x i , y j ), is accurate. ◮ Operationally, we wish to run EnKF with K ≪ L ensemble members. Then our goal is to train a map that transform the subsampled correlation ρ K ij into the accurate correlation ρ L ij . 2 De La Chevroti` ere & H, 2017.
Learning localization map 2 Given a set of large ensemble EnKF solutions, { x a , k m } as a k =1 ,..., L m =1 ,..., M training data set, where L is large enough so the correlation, ρ L ij ≈ ρ ( x i , y j ), is accurate. ◮ Operationally, we wish to run EnKF with K ≪ L ensemble members. Then our goal is to train a map that transform the subsampled correlation ρ K ij into the accurate correlation ρ L ij . ◮ Basically, we consider the following optimization problem: � � � 2 � L x i y j ρ K ij − ρ L p ( ρ K ij | ρ L ij ) p ( ρ L ij ) d ρ K ij d ρ L min ij ij L xi yj [ − 1 , 1] [ − 1 , 1] M , S 1 MC � ( L x i y j ρ K ij , m , s − ρ L ij , m ) 2 , ≈ min MS L xi yj m , s =1 where ρ L ij , m ∼ p ( ρ L ij ) and ρ K ij , m , s ∼ p ( ρ K ij | ρ L ij ) is an estimated correlation using only K out of L training data. 2 De La Chevroti` ere & H, 2017.
Example: On Monsoon-Hadley multicloud model 3 It’s a Galerkin projection of zonally symmetric β -plane primitive eqns into the barotropic, and first two baroclinic modes, stochastically driven by a three-cloud model paradigm. Consider observation model h ( x ) that is similar to a RTM. 3 M. De La Chevroti` ere and B. Khouider 2016.
Example of trained localization map Channel 3 and θ 1 Channel 6 and θ eb
DA results
Correcting biased observation model error 4 All the Kalman based DA method assumes unbiased observation model error, e.g., η i ∼ N (0 , R ) . y i = h ( x i ) + η i , Suppose the operator h is un known. Instead, we are only given ˜ h , then y i = ˜ h ( x i ) + b i where we introduce a biased model error, b i = h ( x i ) − ˜ h ( x i ) + η i . 4 Berry & H, 2017.
Example: Basic radiative transfer model Consider solutions of the stochastic cloud model 5 , { T ( z ) , θ eb , q , f d , f s , f c } . Based on this solutions, define a basic radiative transfer model as follows, � ∞ T ( z ) ∂ T ν h ν ( x ) = θ eb T ν (0) + ∂ z ( z ) dz , 0 where T ν is the transmission between heights z to ∞ that is defined to depend on q . The weighting function, ∂ T ν are defined as follows: ∂ z 16 14 12 10 height (z) 8 6 4 2 0 0 0.05 0.1 0.15 weighting function ( ∂T ν ∂z ) 5 Khouider, Biello, Majda 2010
Example: Basic radiative transfer model Suppose the deep and stratiform cloud top height is z d = 12km, while the cumulus cloud top height is z c = 3km. Define f = { f d , f c , f s } and x = { T ( z ) , θ eb , q } . Then the cloudy RTM is given by, � z d T ( z ) ∂ T ν � � h ν ( x , f ) = (1 − f d − f s ) θ eb T ν (0) + ∂ z ( z ) dz 0 � ∞ T ( z ) ∂ T ν +( f d + f s ) T ( z t ) T ν ( z d ) + ∂ z ( z ) dz z d
Recommend
More recommend