Nonlinear filtering with local couplings Alessio Spantini Ricardo Baptista Youssef Marzouk Massachusetts Institute of Technology Department of Aeronautics & Astronautics ISDA 2019 January 22nd, 2019 1 / 16
Assimilation step in ensemble filtering algorithms At any assimilation time k , we have a Bayesian inverse problem: x i π X π X | Y = y ∗ prior posterior ◮ π X is the forecast distribution on R n ◮ π Y | X is the likelihood of the observations Y ∈ R d ◮ π X | Y = y ∗ is the filtering distribution for a realization y ∗ of the data Goal: sample the posterior given only M prior samples x 1 , . . . , x M mit-logo.jpg 1 / 16
Inference as a transportation of measures ◮ Seek a map T that pushes forward prior to posterior [Moselhy, 2012] ( x 1 , . . . , x M ) ∼ π X = ⇒ ( T ( x 1 ) , . . . , T ( x N )) ∼ π X | Y = y ∗ ◮ The map induces a coupling between prior and posterior measures transport map T ( x i ) x i π X π X | Y = y ∗ How to construct a “good” coupling from samples? mit-logo.jpg 2 / 16
Joint space = ⇒ convex problem joint π Y , X T ( y , x ) π X π X | Y = y ∗ ◮ Construct a coupling T between the joint π Y , X and the posterior ◮ T can be computed via convex optimization given samples from π Y , X ◮ Sample π Y , X using the forecast ensemble and the likelihood ( y i , x i ) y i ∼ π Y | X = x i ◮ Intuition: generalization of perturbed observation EnKF mit-logo.jpg 3 / 16
Couple the joint with a standard normal T : R d + n → R n joint π Y , X π X | Y = y ∗ The problem of finding T can be solved by computing a Knothe- Rosenblatt rearrangement S between π Y , X and N (0 , I d + n ) S : R d + n → R d + n joint π Y , X N (0 , I d + n ) mit-logo.jpg ◮ In a few slides: we will show how to derive T from S 4 / 16
Knothe-Rosenblatt (KR) rearrangement ◮ Definition: for any pair of densities π, η on R n , there exists a unique triangular and monotone map S : R n → R n such that S ♯ π = η ◮ Triangular function (nonlinear generalization of a triangular matrix) S 1 ( x 1 ) S 2 ( x 1 , x 2 ) S ( x 1 , . . . , x n ) = . . . S n ( x 1 , x 2 , . . . , x n ) ◮ Existence stem from general factorization properties of a density, e.g., π = π X 1 π X 2 | X 1 · · · π X n | X 1 ,..., X n − 1 mit-logo.jpg 5 / 16
Triangular maps enable conditional simulation S 1 ( x 1 ) S 2 ( x 1 , x 2 ) S ( x 1 , . . . , x n ) = . . . S n ( x 1 , x 2 , . . . , x n ) ◮ Each component S k links marginal conditionals of π and η ◮ For instance, if η = N (0 , I ) , then for all x 1 , . . . , x k − 1 ∈ R k − 1 ξ �→ S k ( x 1 , . . . , x k − 1 , ξ ) pushes π X k | X 1: k − 1 ( ξ | x 1: k − 1 ) to N (0 , 1) ◮ Simulate the conditional π X k | X 1: k − 1 by inverting a 1-D map ξ �→ S k ( x 1: k − 1 , ξ ) at Gaussian samples ( need triangular structure ) mit-logo.jpg 6 / 16
Back to filtering: the analysis map ◮ We are interested in the KR map S that pushes π Y , X to N (0 , I d + n ) ◮ The rearrangement has a typical block structure � � S Y ( y ) S ( y , x ) = , S X ( y , x ) which suggests two properties: S X pushes π Y , X to N (0 , I n ) ξ �→ S X ( y ∗ , ξ ) pushes π X | Y = y ∗ to N (0 , I n ) ◮ The analysis map that pushes π Y , X to π X | Y = y ∗ is then given by T ( y , x ) = S X ( y ∗ , · ) − 1 ◦ S X ( y , x ) mit-logo.jpg 7 / 16
The stochastic map algorithm joint π Y , X T ( y , x ) π Y | X = x i x i π X π X | Y = y ∗ Transport map filter 1. Compute forecast ensemble x 1 , . . . , x M 2. Generate samples ( y i , x i ) from π Y , X with y i ∼ π Y | X = x i 3. Build an estimator � T of T i = � 4. Compute analysis ensemble as x a T ( y i , x i ) for i = 1 , . . . , M mit-logo.jpg 8 / 16
How to build an estimator for the analysis map ◮ Recall the form of the analysis map T ( y , x ) = S X ( y ∗ , · ) − 1 ◦ S X ( y , x ) , where � � S Y ( y ) S ( y , x ) = , S ♯ π Y , X = N (0 , I d + n ) . S X ( y , x ) ◮ We propose the following estimator � T of T , S X ( y ∗ , · ) − 1 ◦ � T ( y , x ) = � � S X ( y , x ) , where � S is a maximum likelihood estimator of S ◮ The MLE of S can be computed via convex optimization [Parno, 2014] mit-logo.jpg 9 / 16
Estimation of the KR rearrangement from samples Given M samples x 1 , . . . , x M from a density π on R n , we want estimate the KR rearrangement S that pushes forward π to N (0 , I n ) ◮ Constrained MLE for S M � 1 � log S − 1 S ∈ arg min η ( x i ) , η = N (0 , I n ) , ♯ M S ∈H � �� � i =1 pullback where H is an approximation space for the rearrangement S k of � ◮ Each component � S can be computed in parallel via differentiable convex optimization � 1 � M � 1 S k ∈ arg min 2 S k ( x i ) 2 − log ∂ k S k ( x i ) � M S k ∈H k i =1 mit-logo.jpg 10 / 16
Parameterization of the map � 1 � M � 1 S k ∈ arg min 2 S k ( x i ) 2 − log ∂ k S k ( x i ) � M S k ∈H k i =1 ◮ Optimization is not needed for nonlinear separable parameterizations of the form � S k ( x 1: k ) = x k + g ( x 1: k − 1 ) ( linear regression ) ◮ In general, need convex optimization (e.g., Newton’s method) S k yields a ◮ Connection to EnKF: a linear parameterization of � particular form of EnKF with “perturbed observations” ◮ Choice of approximation space allows to control the bias-variance of � S ◮ Richer parameterizations yield less bias, but potentially higher variance Strategy in high dimensions: depart gradually from the linear ansatz by introducing local nonlinearities + regularization mit-logo.jpg 11 / 16
The map is easy to “localize” in high dimensions ◮ Regularize the estimator � S of S by imposing sparsity , e.g., � S 1 ( x 1 ) � S 2 ( x 1 , x 2 ) � S ( x 1 , . . . , x 4 ) = � S 3 ( x 1 , x 2 , x 3 ) � S 4 ( x 1 ,x 2 , x 3 , x 4 ) ◮ The sparsity of the k th component of S depends on the sparsity of the marginal conditional function π X k | X 1: k − 1 ( x k | x 1: k − 1 ) S k depend on variables ( x j ) j<k that are ◮ Quick heuristic: let each � within a distance ℓ from x k in state-space. Estimate optimal ℓ offline ◮ Explicit link between sparsity of a nonlinear S and conditional independence in non-Gaussian graphical models can be found in [Inference via low-dimensional couplings, Spantini et al., 2018] mit-logo.jpg 12 / 16
Lorenz 96 in chaotic regime (40-dimensional state) ◮ A hard test-case configuration [*Bengtsson et al. 2003]: � � d Z j = Z j +1 − Z j − 2 Z j − 1 − Z j + F, j = 1 , . . . , 40 d t = Z j + E j , j = 1 , 3 , 5 . . . , 39 Y j ◮ F = 8 (chaotic) and E j ∼ N (0 , 0 . 5) ( small noise for PF ) ◮ Time between observations: ∆ obs = 0 . 4 ( large ) ◮ Results computed over 2000 assimilation cycles #particles: 400 #particles: 200 RMSE EnKF EnKF MapF MapF median 0.73 0.58 0.73 0.66 mean 0.78 0.80 0.61 0.71 std 0.24 0.18 0.31 0.26 spread 0.78 0.69 0.82 0.75 ◮ The nonlinear filter is ∼ 25% more accurate in RMSE than EnKF
Lorenz 96: details on the filtering approximation ◮ Observations were assimilated one at a time ◮ Impose sparsity of the map with a 5 -way interaction model ( figure ) ◮ Fully separable and nonlinear parameterization of each component � S k ( x j 1 , . . . , x j p , x k ) = ψ ( x j 1 ) + . . . + ψ ( x j p ) + ψ ( x k ) , where ψ ( x ) = a 0 + a 1 · x + � i> 1 a i exp( − ( x − c i ) 2 /σ ) . ◮ Much more general parameterizations are of course possible! mit-logo.jpg 14 / 16
A tradeoff between nonlinearities and ensemble size Average RMSE EnKF 0.9 Order 1 Order 2 Order 3 0.8 0.7 0.6 100 200 400 600 # particles mit-logo.jpg 15 / 16
Discussion Nonlinear filtering with local couplings: ◮ Nonlinear generalization of the EnKF: move the ensemble members via local nonlinear transport maps, no weights or degeneracy ◮ Learn non-Gaussian features via convex optimization ◮ Easy to localize in high dimensions (sparse updates) Extensions and open issues: ◮ There exists a square-root version of the filter ◮ Generalization to nonlinear smoothing ◮ How much nonlinearity in the update vs. ensemble size & structure? ◮ How to regularize the estimation? (e.g., LASSO) ◮ Accuracy and stability. Is it possible to establish consistency by adapting the complexity of the maps to the forecast ensembles?
Recommend
More recommend