improving enkf with machine learning algorithms
play

Improving EnKF with machine learning algorithms John Harlim - PowerPoint PPT Presentation

Improving EnKF with machine learning algorithms John Harlim Department of Mathematics and Department of Meteorology The Pennsylvania State University June 12, 2017 Overview A supervised learning algorithm An unsupervised learning algorithm


  1. Improving EnKF with machine learning algorithms John Harlim Department of Mathematics and Department of Meteorology The Pennsylvania State University June 12, 2017

  2. Overview A supervised learning algorithm An unsupervised learning algorithm (diffusion maps) Learning the localization function of EnKF Learning a likelihood function. Application: To Correct biased observation model error in DA

  3. A supervised learning algorithm The basic idea of supervised learning algorithm is to train a map H : X → Y , from a pair of data set { x i , y i } i =1 ,..., N .

  4. A supervised learning algorithm The basic idea of supervised learning algorithm is to train a map H : X → Y , from a pair of data set { x i , y i } i =1 ,..., N . Remarks: ◮ The objective is to use the estimated map ˆ H to predict y s = ˆ H ( x s ) given new data x s .

  5. A supervised learning algorithm The basic idea of supervised learning algorithm is to train a map H : X → Y , from a pair of data set { x i , y i } i =1 ,..., N . Remarks: ◮ The objective is to use the estimated map ˆ H to predict y s = ˆ H ( x s ) given new data x s . ◮ Various methods to estimate H include regression, SVM, KNN, Neural Nets, etc.

  6. A supervised learning algorithm The basic idea of supervised learning algorithm is to train a map H : X → Y , from a pair of data set { x i , y i } i =1 ,..., N . Remarks: ◮ The objective is to use the estimated map ˆ H to predict y s = ˆ H ( x s ) given new data x s . ◮ Various methods to estimate H include regression, SVM, KNN, Neural Nets, etc. ◮ For this talk, we will focus on how to use regression in appropriate spaces to improve EnKF.

  7. An unsupervised learning algorithm Given a data set { x i } , the main task is to learn a function ϕ ( x i ) that can describe the data. 1 Coifman & Lafon 2006, Berry & H, 2016.

  8. An unsupervised learning algorithm Given a data set { x i } , the main task is to learn a function ϕ ( x i ) that can describe the data. In this talk, I will focus on a nonlinear manifold learning algorithm, the diffusion maps 1 : Given { x i } ∈ M ⊂ R n with a sampling measure q , the diffusion maps algorithm is a kernel based method that produces orthonormal basis functions on the manifold, ϕ k ∈ L 2 ( M , q ). 1 Coifman & Lafon 2006, Berry & H, 2016.

  9. An unsupervised learning algorithm Given a data set { x i } , the main task is to learn a function ϕ ( x i ) that can describe the data. In this talk, I will focus on a nonlinear manifold learning algorithm, the diffusion maps 1 : Given { x i } ∈ M ⊂ R n with a sampling measure q , the diffusion maps algorithm is a kernel based method that produces orthonormal basis functions on the manifold, ϕ k ∈ L 2 ( M , q ). These basis functions are solutions of an eigenvalue problem, � � q − 1 div q ∇ ϕ k ( x ) = λ k ϕ k ( x ) , where the weighted Laplacian operator is approximated with an integral operator with appropriate normalization. 1 Coifman & Lafon 2006, Berry & H, 2016.

  10. Examples: Example: Uniformly distributed data on a circle, we obtain the Fourier basis. e ix e i2x e i3x 1 1 1 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0 0 0 -0.2 -0.2 -0.2 -0.4 -0.4 -0.4 -0.6 -0.6 -0.6 -0.8 -0.8 -0.8 -1 -1 -1 0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6 Example: Gaussian distributed data on a real line, we obtain the Hermite polynomials. 600 30 300 estimate true 400 20 200 200 10 100 ϕ 3 ( x ) ϕ 1 ( x ) ϕ 2 ( x ) 0 0 0 -200 -10 -100 -400 -20 -200 -600 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 -30 -300 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 x

  11. Example: Nonparametric basis functions estimated on nontrivial manifold 45 45 40 40 35 35 30 30 25 25 20 20 15 15 10 10 5 5 −20 −20 10 10 −10 −10 0 0 0 0 10 10 −10 −10 20 20 45 45 40 40 35 35 30 30 25 25 20 20 15 15 10 10 5 5 −20 −20 10 10 −10 −10 0 0 0 0 10 10 −10 −10 20 20 Remark: Essentially, one can view the DM as a method to learn generalized Fourier basis on the manifold.

  12. Learning the localization function of EnKF ◮ When EnKF is performed with small ensemble size, one way to alleviate the spurious correlation is to employ a localization function.

  13. Learning the localization function of EnKF ◮ When EnKF is performed with small ensemble size, one way to alleviate the spurious correlation is to employ a localization function. ◮ For example, in the serial EnKF, for each scalar observation, y i , one “localizes” the Kalman gain, K = L xy i ◦ XY ⊤ i ( Y i Y ⊤ + R ) − 1 , i with an empirically chosen localization function L xy i (Gaspari-Cohn, etc), which requires some tunings.

  14. Learning the localization function of EnKF ◮ When EnKF is performed with small ensemble size, one way to alleviate the spurious correlation is to employ a localization function. ◮ For example, in the serial EnKF, for each scalar observation, y i , one “localizes” the Kalman gain, K = L xy i ◦ XY ⊤ i ( Y i Y ⊤ + R ) − 1 , i with an empirically chosen localization function L xy i (Gaspari-Cohn, etc), which requires some tunings. ◮ Let’s use the idea from machine learning to train this localization function. The key idea is to find a map that takes poorly estimated correlations to accurately estimated correlations.

  15. Learning localization map 2 Given a set of large ensemble EnKF solutions, { x a , k m } as a k =1 ,..., L m =1 ,..., M training data set, where L is large enough so the correlation, ρ L ij ≈ ρ ( x i , y j ), is accurate. 2 De La Chevroti` ere & H, 2017.

  16. Learning localization map 2 Given a set of large ensemble EnKF solutions, { x a , k m } as a k =1 ,..., L m =1 ,..., M training data set, where L is large enough so the correlation, ρ L ij ≈ ρ ( x i , y j ), is accurate. ◮ Operationally, we wish to run EnKF with K ≪ L ensemble members. Then our goal is to train a map that transform the subsampled correlation ρ K ij into the accurate correlation ρ L ij . 2 De La Chevroti` ere & H, 2017.

  17. Learning localization map 2 Given a set of large ensemble EnKF solutions, { x a , k m } as a k =1 ,..., L m =1 ,..., M training data set, where L is large enough so the correlation, ρ L ij ≈ ρ ( x i , y j ), is accurate. ◮ Operationally, we wish to run EnKF with K ≪ L ensemble members. Then our goal is to train a map that transform the subsampled correlation ρ K ij into the accurate correlation ρ L ij . ◮ Basically, we consider the following optimization problem: � � � 2 � L x i y j ρ K ij − ρ L p ( ρ K ij | ρ L ij ) p ( ρ L ij ) d ρ K ij d ρ L min ij ij L xi yj [ − 1 , 1] [ − 1 , 1] M , S 1 MC � ( L x i y j ρ K ij , m , s − ρ L ij , m ) 2 , ≈ min MS L xi yj m , s =1 where ρ L ij , m ∼ p ( ρ L ij ) and ρ K ij , m , s ∼ p ( ρ K ij | ρ L ij ) is an estimated correlation using only K out of L training data. 2 De La Chevroti` ere & H, 2017.

  18. Example: On Monsoon-Hadley multicloud model 3 It’s a Galerkin projection of zonally symmetric β -plane primitive eqns into the barotropic, and first two baroclinic modes, stochastically driven by a three-cloud model paradigm. Consider observation model h ( x ) that is similar to a RTM. 3 M. De La Chevroti` ere and B. Khouider 2016.

  19. Example of trained localization map Channel 3 and θ 1 Channel 6 and θ eb

  20. DA results

  21. Correcting biased observation model error 4 All the Kalman based DA method assumes unbiased observation model error, e.g., η i ∼ N (0 , R ) . y i = h ( x i ) + η i , Suppose the operator h is un known. Instead, we are only given ˜ h , then y i = ˜ h ( x i ) + b i where we introduce a biased model error, b i = h ( x i ) − ˜ h ( x i ) + η i . 4 Berry & H, 2017.

  22. Example: Basic radiative transfer model Consider solutions of the stochastic cloud model 5 , { T ( z ) , θ eb , q , f d , f s , f c } . Based on this solutions, define a basic radiative transfer model as follows, � ∞ T ( z ) ∂ T ν h ν ( x ) = θ eb T ν (0) + ∂ z ( z ) dz , 0 where T ν is the transmission between heights z to ∞ that is defined to depend on q . The weighting function, ∂ T ν are defined as follows: ∂ z 16 14 12 10 height (z) 8 6 4 2 0 0 0.05 0.1 0.15 weighting function ( ∂T ν ∂z ) 5 Khouider, Biello, Majda 2010

  23. Example: Basic radiative transfer model Suppose the deep and stratiform cloud top height is z d = 12km, while the cumulus cloud top height is z c = 3km. Define f = { f d , f c , f s } and x = { T ( z ) , θ eb , q } . Then the cloudy RTM is given by, � z d T ( z ) ∂ T ν � � h ν ( x , f ) = (1 − f d − f s ) θ eb T ν (0) + ∂ z ( z ) dz 0 � ∞ T ( z ) ∂ T ν +( f d + f s ) T ( z t ) T ν ( z d ) + ∂ z ( z ) dz z d

Recommend


More recommend