large scale data fusion for improved model simulation and
play

Large-Scale Data Fusion for Improved Model Simulation and - PowerPoint PPT Presentation

Large-Scale Data Fusion for Improved Model Simulation and Predictability Ahmed Attia Mathematics and Computer Science Division (MCS), Argonne National Laboratory (ANL) Thanks to Adrian Sandu: VTech Emil M. Constantinescu: ANL/UChicago Alen


  1. Data Assimilation: Problem Setup ◮ Filtering (3D-DA): assimilate a single observation at a time ( x k | y k ) Assimilation Cycle Assimilation Cycle % ! $ →⋯ Time ! " ! # ! $ ! ( ◮ Smoothing (4D-DA): assimilate multiple observations at once ( x 0 | y 1 , y 2 , . . . , y m ) Assimilation Cycle Corrected IC (Analysis) Adjoint & ! ) ,→ ! " ∗ Observation & Likelihood Prior IC Forward Model & ! " ,→ ! ) Time ! " ! # ! $ ! % Improving model predictability Bayesian Inversion & Data Assimilation [7/65] October 31, 2018: ANL; Ahmed Attia.

  2. Data Assimilation: Probabilistic Assumptions & Solvers Simplifying assumptions are imposed on the error distribution, “ typically ”, errors are assumed to be Gaussian ( easy, tractable, ... ). ◮ The Gaussian framework : Prior: x b − x true ∼ N (0 , B ) - Likelihood: y − H ( x true ) ∼ N (0 , R ) - → Posterior: x a − x true ∼ N (0 , A ) Improving model predictability Bayesian Inversion & Data Assimilation [8/65] October 31, 2018: ANL; Ahmed Attia.

  3. Data Assimilation: Probabilistic Assumptions & Solvers Simplifying assumptions are imposed on the error distribution, “ typically ”, errors are assumed to be Gaussian ( easy, tractable, ... ). ◮ The Gaussian framework : Prior: x b − x true ∼ N (0 , B ) - Likelihood: y − H ( x true ) ∼ N (0 , R ) - → Posterior: x a − x true ∼ N (0 , A ) ◮ Approaches & solvers: Approach Variational: solve an optimization problem, Ensemble: use Bayes’ theorem, with Monte-Carlo e.g., minimize the negative-log of the representation of errors and states/parameters posterior, to get an analysis state • EnKF: Ensemble Kalman filter • MLEF: Maximum-likelihood ensemble filter Filtering • IEnKF: Iterative Ensemble Kalman filter • 3DVar: three-dimensional Variational DA Problem Setup (3D) • PF: Particle filters • MCMC: Markov Chain Monte-Carlo sampling • … • EnKS: Ensemble Kalman Smoother Smoothing • 4DVar: four-dimensional Variational DA • MCMC: Markov Chain Monte-Carlo sampling (4D) • … Improving model predictability Bayesian Inversion & Data Assimilation [8/65] October 31, 2018: ANL; Ahmed Attia.

  4. Data Assimilation: Challenges ◮ Dimensionality: - Model state space: N state ∼ 10 8 − 12 - Observation space: N obs ≪ N state - Ensemble size: N ens ∼ 100 ◮ Gaussian framework: - Strong assumption that holds for linear dynamics and linear observation operator H - EnKF is the most popular filter for “linear-Gaussian” settings: → Sampling errors → Spurious long-range correlations, → Rank-deficiency → Ensemble collapse, and filter divergence - H is becoming more nonlinear , leading to non-Gaussian posterior ◮ Non-Gaussian DA - PF: Degeneracy - MCMC: Gold standard, yet computationally unaffordable Improving model predictability Bayesian Inversion & Data Assimilation [9/65] October 31, 2018: ANL; Ahmed Attia.

  5. Data Assimilation: Challenges ◮ Dimensionality: - Model state space: N state ∼ 10 8 − 12 - Observation space: N obs ≪ N state - Ensemble size: N ens ∼ 100 ◮ Gaussian framework: - Strong assumption that holds for linear dynamics and linear observation operator H - EnKF is the most popular filter for “linear-Gaussian” settings: → Sampling errors → Spurious long-range correlations, → Rank-deficiency → Ensemble collapse, and filter divergence - H is becoming more nonlinear , leading to non-Gaussian posterior ◮ Non-Gaussian DA - PF: Degeneracy - MCMC: Gold standard, yet computationally unaffordable ◮ Resampling family: Gradient-based MCMC (e.g. HMC) filtering and smoothing algorithms for non-Gaussian DA Improving model predictability Bayesian Inversion & Data Assimilation [9/65] October 31, 2018: ANL; Ahmed Attia.

  6. Motivation Bayesian Inversion & Data Assimilation A Resampling Family for Non-Gaussian DA 1- HMC sampling filter 2- Cluster sampling filters 3- HMC sampling smoother 4- Reduced-order HMC smoother Optimal Design of Experiments (ODE) Bayesian inversion & sensor placement Goal-Oriented approach for ODE (GOODE) EnKF Inflation & Localization OED-based inflation & localization Concluding Remarks & Future Plans Improving model predictability A Resampling Family for Non-Gaussian DA [10/65] October 31, 2018: ANL; Ahmed Attia.

  7. Hybrid Monte-Carlo (HMC) sampling To draw samples { x ( e ) } e =1 , 2 ,... from ∝ π ( x ) = e −J ( x ) : - x : viewed as a position variable, - Add synthetic ” momentum “ p ∼ N (0 , M ) and sample the joint PDF, then discard p . - Generate a MC with invariant distribution ∝ exp ( − H ( p , x )) ; - HMC proposal : symplectic integrator plays the role of a proposal density. • The Hamiltonian: H ( p , x ) = 1 = 1 2 p T M − 1 p 2 p T M − 1 p − log( π ( x )) J ( x ) + � �� � � �� � potential energy kinetic energy • The Hamiltonian dynamics (a symplectic integrator used): d x d p dt = ∇ p H = M − 1 p , dt = −∇ x H = −∇ x J ( x ) • The canonical PDF of ( p , x ) : � � � � − 1 − 1 2 p T M − 1 p − J ( x ) 2 p T M − 1 p ∝ exp ( − H ( p , x )) = exp = exp π ( x ) Improving model predictability A Resampling Family for Non-Gaussian DA [11/65] October 31, 2018: ANL; Ahmed Attia.

  8. Hybrid Monte-Carlo (HMC) sampling Symplectic integrators ◮ To integrate the solution of the Hamiltonian equations from pseudo time t k to time t k +1 = t k + h : 1. Position Verlet integrator x k + h 2 M − 1 p k , x k +1 / 2 = = p k − h ∇ x J ( x k +1 / 2 ) , p k +1 x k +1 / 2 + h 2 M − 1 p k +1 . x k +1 = 2. Two-stage integrator x k + ( a 1 h ) M − 1 p k , x 1 = p 1 = p k − ( b 1 h ) ∇ x J ( x 1 ) , x 1 + ( a 2 h ) M − 1 p 1 , x 2 = p 1 − ( b 1 h ) ∇ x J ( x 2 ) , p k +1 = x 2 + ( a 2 h ) M − 1 p k +1 , = x k +1 a 1 = 0 . 21132 , a 2 = 1 − 2 a 1 , b 1 = 0 . 5 . ◮ MH: Acceptance Probability: a ( k ) = 1 ∧ e − ∆ H , ∆ H = H ( p ∗ , x ∗ ) − H ( p k , x k ) � x ∗ a ( k ) with probability x k +1 = with probability 1 − a ( k ) x k Improving model predictability A Resampling Family for Non-Gaussian DA [12/65] October 31, 2018: ANL; Ahmed Attia.

  9. Hybrid Monte-Carlo (HMC) sampling Examples; code is available from: https://www.mcs.anl.gov/~attia/software.html ◮ MH Sampling ◮ HMC Sampling Improving model predictability A Resampling Family for Non-Gaussian DA [13/65] October 31, 2018: ANL; Ahmed Attia.

  10. 1- HMC sampling filter † (for sequential DA) The analysis step Assimilation Cycle Assimilation Cycle Assimilate given information (e.g. background and observations ) at a single time instance t k . % ! $ →⋯ Time ! " ! # ! $ ! ( ◮ Gaussian framework: � � � � − 1 − 1 b ( x ) ∝ exp b � B − 1 2 � x − x ; P ( y | x ) ∝ exp 2 �H ( x ) − y � R − 1 . P π ( x ) � �� � a ( x ) ∝ P exp ( −J ( x )) , ◮ Potential energy and gradient: 1 b � B − 1 + 1 J ( x ) = 2 � x − x 2 �H ( x ) − y � R − 1 , b ) + H T R − 1 ( H ( x ) − y ) . B − 1 ( x − x ∇ x J ( x ) = ◮ The Hamiltonian: H ( p , x ) = 1 2 p T M − 1 p + J ( x ) . † Attia, Ahmed, and Adrian Sandu. ”A hybrid Monte Carlo sampling filter for non-Gaussian data assimilation.” AIMS Geosciences 1, no. geosci-01-00041 (2015): 41-78. Improving model predictability A Resampling Family for Non-Gaussian DA [14/65] October 31, 2018: ANL; Ahmed Attia.

  11. 1- HMC sampling filter (for sequential DA) Numerical experiments: setup ◮ The model (Lorenz-96): dxi = x i − 1 ( x i +1 − x i − 2 ) − x i + F ; i = 1 , 2 , . . . , 40 dt ◦ x ∈ R 40 is the state vector, with x 0 ≡ x 40 , and F = 8 ◮ Initial background ensemble & uncertainty: ◦ reference IC: x True = M t =0 → t =5 ( − 2 , . . . , 2) T 0 � � ◦ B 0 = σ 0 I ∈ R N state × N state , with σ 0 = 0 . 08 � � x True � � 0 2 ◮ Observations: ◦ σ obs = 5% of the average magnitude of the observed reference trajectory ◦ R = σ obs I ∈ R N obs × N obs ◦ Synthetic observations are generated every 20 time steps, with ( x 1 , x 4 , x 7 , . . . , x 37 , x 40 ) T ∈ R 14 � � x 2 : x i ≥ 0 . 5 with x ′ i H ( x ) = i = 40 ) T ∈ R 14 ( x ′ 1 , x ′ 4 , x ′ 7 , . . . , x ′ 37 , x ′ − x 2 : x i < 0 . 5 i ◮ Benchmark EnKF flavor: DEnKF with Gaspari-Cohn (GC) localization Experiments are carried out using DATeS - Ahmed Attia and Adrian Sandu, DATeS: A Highly-Extensible Data Assimilation Testing Suite, Geosci. Model Dev. Discuss., https://doi.org/10.5194/gmd-2018-30, in review, 2018. - http://people.cs.vt.edu/~attia/DATeS/ Improving model predictability A Resampling Family for Non-Gaussian DA [15/65] October 31, 2018: ANL; Ahmed Attia.

  12. 1- HMC sampling filter (for sequential DA) Numerical experiments: results with linear H Forecast Forecast EnKF EnKF 10 0 HMC 10 0 HMC RMSE RMSE 10 -1 10 -1 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 Time Time (a) RMSE (a) RMSE 0.1 0.1 0.1 0.1 0.09 0.09 0.09 0.09 0.08 0.08 0.08 0.08 0.07 0.07 0.07 0.07 0.06 0.06 0.06 0.06 0.05 0.05 0.05 0.05 0.04 0.04 0.04 0.04 0.03 0.03 0.03 0.03 0.02 0.02 0.02 0.02 0.01 0.01 0.01 0.01 0 0 0 0 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 (b) R. Hist: x 1 (c) R. Hist: x 2 (b) R. Hist: x 1 (c) R. Hist: x 2 Position Verlet symplectic integrator is used with Two-stage symplectic integrator is used with time time step T = 0 . 1 with h = 0 . 01 , ℓ = 10 , and step T = 0 . 1 with h = 0 . 01 , ℓ = 10 , and 10 10 mixing steps. The ( log ) RMSE reported for mixing steps. The ( log ) RMSE reported for the the HMC filter is the average taken over the 100 HMC filter is the average taken over the 100 realizations of the filter. realizations of the filter. Improving model predictability A Resampling Family for Non-Gaussian DA [16/65] October 31, 2018: ANL; Ahmed Attia.

  13. 1- HMC sampling filter (for sequential DA) Numerical experiments: results with discontinuous quadratic H 0.1 0.1 0.09 0.09 0.08 0.08 0.07 0.07 0.06 0.06 0.05 0.05 0.04 0.04 0.03 0.03 Forecast 0.02 0.02 0.01 0.01 IEnKF 0 0 HMC 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 10 0 RMSE MLEF (a) x 1 (b) x 2 Rank histograms for observed and unobserved 10 -1 components of the state vector with N ens = 30 . 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 0.2 0.2 Time 0.18 0.18 0.16 0.16 0.14 0.14 0.12 0.12 0.1 0.1 Three-stage symplectic integrator is used with 0.08 0.08 0.06 0.06 time step T = 0 . 1 with h = 0 . 01 , ℓ = 10 , and 0.04 0.04 0.02 0.02 0 0 10 mixing steps. The ( log ) RMSE reported for 0 2 4 6 8 10 0 2 4 6 8 10 the HMC filter is the average taken over the 100 (a) x 1 (b) x 2 realizations of the filter. Rank histograms for observed and unobserved components of the state vector with N ens = 10 . Improving model predictability A Resampling Family for Non-Gaussian DA [17/65] October 31, 2018: ANL; Ahmed Attia.

  14. 1- HMC sampling filter (for sequential DA) Numerical experiments: accuracy vs cost 0.1000 0.1000 2 × 10 2 × 10 EnKF EnKF MLEF MLEF IEnKF IEnKF HMC-Verlet HMC-Verlet 0.0100 0.0100 HMC-2Stage HMC-2Stage HMC-3Stage HMC-3Stage RMSE RMSE HMC-4Stage HMC-4Stage HMC-Hilbert HMC-Hilbert 0.0010 0.0010 0.0001 0.0001 0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35 CPU-Time (seconds) CPU-Time (seconds) (a) Linear observation operator (b) Quadratic observation operator with a threshold RMSE Vs CPU-time per assimilation cycle of DA with the Lorenz-96 model. The time reported is the average CPU-time taken over 100 identical runs of each experiment. The ensemble size is fixed to 30 members for all experiments here. Improving model predictability A Resampling Family for Non-Gaussian DA [18/65] October 31, 2018: ANL; Ahmed Attia.

  15. Relaxing the Gaussian-Prior Assumption ◮ To this point, we have assumed that “the prior can be well-approximated by a Gaussian”. ◮ In practice, the prior is generally expected to be non-Gaussian. ◮ The prior PDF can hardly be formulated explicitly or even upto a scaling factor. Improving model predictability A Resampling Family for Non-Gaussian DA [19/65] October 31, 2018: ANL; Ahmed Attia.

  16. Relaxing the Gaussian-Prior Assumption ◮ To this point, we have assumed that “the prior can be well-approximated by a Gaussian”. ◮ In practice, the prior is generally expected to be non-Gaussian. ◮ The prior PDF can hardly be formulated explicitly or even upto a scaling factor. ◮ Can we efficiently approximate the prior distribution given the ensemble of forecasts? Improving model predictability A Resampling Family for Non-Gaussian DA [19/65] October 31, 2018: ANL; Ahmed Attia.

  17. Relaxing the Gaussian-Prior Assumption ◮ To this point, we have assumed that “the prior can be well-approximated by a Gaussian”. ◮ In practice, the prior is generally expected to be non-Gaussian. ◮ The prior PDF can hardly be formulated explicitly or even upto a scaling factor. ◮ Can we efficiently approximate the prior distribution given the ensemble of forecasts? ◮ Idea: approximate the prior density by fitting a Gaussian mixture model (GMM) to the forecast ensemble † (e.g. using EM). † Attia, Ahmed, Azam Moosavi, and Adrian Sandu. ”Cluster sampling filters for non-Gaussian data assimilation.” Atmosphere 9, no. 6 (2018): 213. Improving model predictability A Resampling Family for Non-Gaussian DA [19/65] October 31, 2018: ANL; Ahmed Attia.

  18. 2- Sampling filters with GMM prior; cluster sampling filters ◮ Prior (GMM): n c � b ( x k ) = τ k,i N ( µ k,i , Σ k,i ) P i =1 (2 π ) − N state � � n c − 1 � 2 2 = τ k,i exp 2 � x k − µ k,i � , � Σ k,i − 1 | Σ k,i | i =1 where τ k,i = P ( x k ( e ) ∈ i th component, and ( µ k,i , Σ k,i ) are the mean and the covariance matrix associated with the i th component. ◮ Likelihood: P ( y k | x k ) = (2 π ) − N obs � � − 1 2 2 exp 2 �H k ( x k ) − y k � � R − 1 | R k | k ◮ Posterior: n c � � τ k,i − 1 Σ k , i − 1 − 1 � a ( x k ) ∝ 2 2 exp 2 � x k − µ k,i � 2 �H k ( x k ) − y k � P � R − 1 | Σ k,i | k i =1 Improving model predictability A Resampling Family for Non-Gaussian DA [20/65] October 31, 2018: ANL; Ahmed Attia.

  19. 2- Sampling filters with GMM prior; cluster sampling filters HMC sampling filter with GMM prior ( C ℓ HMC ) ◮ Potential energy and gradient: � �� � n c J ( x k ) = 1 τ k,i − 1 � 2 2 2 �H k ( x k ) − y k � − log exp 2 � x k − µ k,i � R − 1 � Σ k,i − 1 | Σ k,i | k i =1 � � = 1 τ k, 1 2 2 �H k ( x k ) − y k � + J k, 1 ( x k ) − log R − 1 � | Σ k, 1 | k � � � n c τ k,i | Σ k, 1 | � − log 1 + exp ( J k, 1 ( x k ) − J k,i ( x k )) . � τ k, 1 | Σ k,i | i =2 ∇ x k J ( x k ) = H T k R − 1 k ( H k ( x k ) − y k ) + ∇ x k J k, 1 ( x k ) 1 − √ | Σ k, 1 | � � 1 + � n c τk,j √ | Σ k,j | exp ( J k, 1 ( x k ) − J k,j ( x k )) j =2 τk, 1 � � � � n c τ k,i | Σ k, 1 | � � exp ( J k, 1 ( x k ) − J k,i ( x k )) ∇ x k J k, 1 − ∇ x k J k,i , � τ k, 1 | Σ k,i | i =2 ∇ x k J k,i = ∇ x k J k,i ( x k ) = Σ − 1 k,i ( x k − µ k,i ) ; ∀ i = 1 , 2 , . . . , n c . ◮ The Hamiltonian: H ( p k , x k ) = 1 T M − 1 p k + J ( x k ) . 2 p k Improving model predictability A Resampling Family for Non-Gaussian DA [21/65] October 31, 2018: ANL; Ahmed Attia.

  20. 2- Sampling filters with GMM prior; cluster sampling filters limitations & multi-chain samplers (MC- C ℓ HMC , MC- C ℓ MCMC ) ◮ If GMM has too many components, C ℓ HMC may collapse (i.e. filter degeneracy) Improving model predictability A Resampling Family for Non-Gaussian DA [22/65] October 31, 2018: ANL; Ahmed Attia.

  21. 2- Sampling filters with GMM prior; cluster sampling filters limitations & multi-chain samplers (MC- C ℓ HMC , MC- C ℓ MCMC ) ◮ If GMM has too many components, C ℓ HMC may collapse (i.e. filter degeneracy) ◮ This could be avoided if we force the sampler to collect ensemble members from all the probability modes Idea: construct a Markov chain to sample each of the components in the posterior → Multi-chain cluster sampling filter (MC- C ℓ MCMC , MC- C ℓ HMC ) ◮ The parameters of each chain can be tuned locally - Chains are initialized to the components’ means in the prior mixture - The local ensemble size (sample size per chain) can be specified for example based on the prior weight of the corresponding component, multiplied by the likelihood of the mean of that component Improving model predictability A Resampling Family for Non-Gaussian DA [22/65] October 31, 2018: ANL; Ahmed Attia.

  22. 2- HMC sampling filter with GMM prior; cluster sampling filters Numerical experiments: setup ◮ The model (QG-1.5): 3 ψ + 2 π sin(2 πy ) , q t = ψ x − εJ ( ψ, q ) − A ∆ q = ∆ ψ − F ψ , J ( ψ, q ) ≡ ψ x q x − ψ y q y , ∂ 2 ∂ 2 where ∆ ≡ ∂y 2 . ∂x 2 + - State vector: ψ ∈ R 16641 . - Model subspace dimension of the order of 10 2 − 10 3 . - ψ is interpreted as either a stream function or surface elevation . - Here F = 1600 , ε = 10 − 5 , and A = 2 × 10 − 12 . - Boundary conditions: ψ = ∆ ψ = ∆ 2 ψ = 0 . ◮ The observations: 1. A linear operator with random offset, 2. A flow-velocity magnitude operator: u = + ∂ψ v = − ∂ψ H : R 16641 → R 300 ; � u 2 + v 2 ; H : ψ → ∂y , ∂x . Improving model predictability A Resampling Family for Non-Gaussian DA [23/65] October 31, 2018: ANL; Ahmed Attia.

  23. 2- HMC sampling filter with GMM prior; cluster sampling filters Numerical experiments: QG-1.5 results with linear H 0.20 0.20 Relative frequency 0.15 Relative frequency 0.15 9 Forecast 0.10 0.10 8 EnKF HMC 0.05 0.05 7 ClHMC MC-ClHMC 0.00 0.00 6 0 5 10 15 20 0 5 10 15 20 25 Rank (truth among ensemble members) Rank (truth among ensemble members) RMSE 5 (a) DEnKF (b) HMC 4 3 2 0.20 0.20 1 Relative frequency Relative frequency 0.15 0.15 0 0.10 0 20 40 60 80 100 0.10 Time (assimilation cycles) 0.05 0.05 0.00 0.00 0 5 10 15 20 25 0 5 10 15 20 25 Rank (truth among ensemble members) Rank (truth among ensemble members) Data assimilation results with the linear (c) C ℓ HMC +AIC (d) MC- C ℓ HMC +AIC observation operator. RMSE of the analyses obtained by EnKF, HMC, C ℓ HMC , and MC- C ℓ HMC filters. The ensemble size is 25 . The Data assimilation results with the linear symplectic integrator used is 3-stage, with observation operator. The rank histograms of h = 0 . 0075 , ℓ = 25 , for HMC and C ℓ HMC , and where the truth ranks among posterior ensemble members. The ranks are evaluated for every 16 th h = 0 . 05 / n c , ℓ = 15 for MC- C ℓ HMC . variable in the state vector (past the correlation bound) at 100 assimilation times. Improving model predictability A Resampling Family for Non-Gaussian DA [24/65] October 31, 2018: ANL; Ahmed Attia.

  24. 2- HMC sampling filter with GMM prior; cluster sampling filters Numerical experiments: QG-1.5 results with flow magnitude H 0.30 Relative frequency 0.25 0.20 0.15 0.10 9 Forecast 0.05 8 HMC 0.00 0 5 10 15 20 25 ClHMC Rank (truth among ensemble members) 7 MC-ClHMC 6 (a) HMC 5 RMSE 4 0.20 3 Relative frequency 0.4 Relative frequency 0.15 2 0.3 1 0.10 0.2 0 0.05 0 20 40 60 80 100 0.1 Time (assimilation cycles) 0.0 0.00 0 5 10 15 20 25 0 5 10 15 20 25 Rank (truth among ensemble members) Rank (truth among ensemble members) (b) C ℓ HMC +AIC (c) MC- C ℓ HMC +AIC RMSE of the analyses obtained by HMC, C ℓ HMC , and MC- C ℓ HMC filtering schemes. In this experiment, EnKF analysis diverged after the The rank histograms of where the truth ranks third cycle, and it’s RMSE results have been among posterior ensemble members. The ranks are evaluated for every 16 th variable in the state omitted for clarity. vector (past the correlation bound) at 100 assimilation times. The filtering scheme used is indicated under each panel. Improving model predictability A Resampling Family for Non-Gaussian DA [25/65] October 31, 2018: ANL; Ahmed Attia.

  25. [3, 4] HMC sampling smoothers Assimilation Cycle Corrected IC (Analysis) Adjoint Assimilate a set of observations y 0 , y 1 , . . . y m at ∗ & ! ) ,→ ! " Observation & once, to a background x 0 . Likelihood Prior IC Forward Model & ! " ,→ ! ) Time ! " ! # ! $ ! % 1. Attia, Ahmed, Vishwas Rao, and Adrian Sandu. ”A sampling approach for four dimensional data assimilation.” In Dynamic Data-Driven Environmental Systems Science, pp. 215-226. Springer, Cham, 2015. 2. Attia, Ahmed, Vishwas Rao, and Adrian Sandu. ”A hybrid Monte-Carlo sampling smoother for four-dimensional data assimilation.” International Journal for Numerical Methods in Fluids 83, no. 1 (2017): 90-112. 3. Attia, Ahmed, Razvan Stefanescu, and Adrian Sandu. ”The reduced-order hybrid Monte-Carlo sampling smoother.” International Journal for Numerical Methods in Fluids 83, no. 1 (2017): 28-51. Improving model predictability A Resampling Family for Non-Gaussian DA [26/65] October 31, 2018: ANL; Ahmed Attia.

  26. Motivation Bayesian Inversion & Data Assimilation A Resampling Family for Non-Gaussian DA 1- HMC sampling filter 2- Cluster sampling filters 3- HMC sampling smoother 4- Reduced-order HMC smoother Optimal Design of Experiments (ODE) Bayesian inversion & sensor placement Goal-Oriented approach for ODE (GOODE) EnKF Inflation & Localization OED-based inflation & localization Concluding Remarks & Future Plans Improving model predictability Optimal Design of Experiments (ODE) [27/65] October 31, 2018: ANL; Ahmed Attia.

  27. Optimal Experimental Design ◮ Sensor placement for optimal parameter recovery � � y 1 , . . . , y N s Experimental design: ξ := w 1 , . . . , w N s � � 0 sensor inactive y 1 , . . . , y N s : candidate sensor locations; we can vary weights w i = sensor active 1 : ◮ Find the best r sensor location such as to maximize some utility function (e.g. identification accuracy, information gain, etc.) Improving model predictability Optimal Design of Experiments (ODE) [28/65] October 31, 2018: ANL; Ahmed Attia.

  28. Optimal Experimental Design ◮ Sensor placement for optimal parameter recovery � � y 1 , . . . , y N s Experimental design: ξ := w 1 , . . . , w N s � � 0 sensor inactive y 1 , . . . , y N s : candidate sensor locations; we can vary weights w i = sensor active 1 : ◮ Find the best r sensor location such as to maximize some utility function (e.g. identification accuracy, information gain, etc.) ◮ Challenges: � N s � 1. Brute force search for an optimal design is combinatorially prohibitive. It requires function r evaluations; e.g., for N s = 35 , and r = 10 , then ∼ 2 × 10 8 function evaluations 2. Each function evaluations is prohibitively expensive * The covariance matrix can have over 10 12 entries ∼ 8 TB * Need to evaluate the determinant or the trace repeatedly Improving model predictability Optimal Design of Experiments (ODE) [28/65] October 31, 2018: ANL; Ahmed Attia.

  29. Optimal Experimental Design ◮ Sensor placement for optimal parameter recovery � � y 1 , . . . , y N s Experimental design: ξ := w 1 , . . . , w N s � � 0 sensor inactive y 1 , . . . , y N s : candidate sensor locations; we can vary weights w i = sensor active 1 : ◮ Find the best r sensor location such as to maximize some utility function (e.g. identification accuracy, information gain, etc.) ◮ Challenges: � N s � 1. Brute force search for an optimal design is combinatorially prohibitive. It requires function r evaluations; e.g., for N s = 35 , and r = 10 , then ∼ 2 × 10 8 function evaluations 2. Each function evaluations is prohibitively expensive * The covariance matrix can have over 10 12 entries ∼ 8 TB * Need to evaluate the determinant or the trace repeatedly ◮ Solution strategy: - Gradient based optimization with relaxation w i ∈ [0 , 1] , and - use sparsifying penalty functions Improving model predictability Optimal Design of Experiments (ODE) [28/65] October 31, 2018: ANL; Ahmed Attia.

  30. Inverse Problem & Sensor Placement Bayesian inverse problem: Gaussian framework ◮ Forward operator: y = F ( θ ) + η ; η ∼ N (0 , Γ noise ) ◮ The prior and the likelihood: P ( θ ) = N ( θ pr , Γ pr ) , P ( y | θ ) = N ( F ( θ ) , Γ noise ) , For time-dependent model, with temporally-uncorrelated observational noise : Γ noise is a block diagonal with k th equal to R k , observation error covariances at time instance t k � � : ◮ The posterior: N θ y post , Γ post � � − 1 ≡ � � − 1 = H − 1 F ∗ Γ − 1 noise F + Γ − 1 H misfit + Γ − 1 Γ post = pr pr � � Γ − 1 pr θ pr + F ∗ Γ − 1 y , where θ post = Γ post noise y * F ∗ is the adjoint of the forward operator F * H is the Hessian of the negative posterior-log * H misfit is the data misfit term of the Hessian (i.e. Hessian-misfit) Improving model predictability Optimal Design of Experiments (ODE) [29/65] October 31, 2018: ANL; Ahmed Attia.

  31. Experimental Design Standard formulation ◮ The design w enters the Bayesian inverse problem through the data likelihood: � � − 1 T W Γ ( F ( θ ) − y ) W Γ = Γ − 1 / 2 noise WΓ − 1 / 2 π like ( y | θ ; w ) ∝ exp 2 ( F ( θ ) − y ) ; noise where W = I m ⊗ W s , and W s = diag ( w 1 , . . . , w N s ) ◮ Given the weighted likelihood, the posterior covariance of θ reads: � − 1 = � � � − 1 Γ post ( w ) = [ H ( w )] − 1 = F ∗ W Γ F + Γ − 1 H misfit ( w ) + Γ − 1 pr pr Here, ⊗ is the Kronecker product. Improving model predictability Optimal Design of Experiments (ODE) [30/65] October 31, 2018: ANL; Ahmed Attia.

  32. Experimental Design Standard formulation ◮ The design w enters the Bayesian inverse problem through the data likelihood: � � − 1 T W Γ ( F ( θ ) − y ) W Γ = Γ − 1 / 2 noise WΓ − 1 / 2 π like ( y | θ ; w ) ∝ exp 2 ( F ( θ ) − y ) ; noise where W = I m ⊗ W s , and W s = diag ( w 1 , . . . , w N s ) ◮ Given the weighted likelihood, the posterior covariance of θ reads: � − 1 = � � � − 1 Γ post ( w ) = [ H ( w )] − 1 = F ∗ W Γ F + Γ − 1 H misfit ( w ) + Γ − 1 pr pr ◮ Standard Approach for ODE: find w that minimizes posterior uncertainty, e.g.: ◮ A-optimality: Tr ( Γ post ) ◮ D-optimality: det ( Γ post ) ◮ etc. Here, ⊗ is the Kronecker product. Improving model predictability Optimal Design of Experiments (ODE) [30/65] October 31, 2018: ANL; Ahmed Attia.

  33. Experimental Design Goal-oriented formulation what if we are interested in a prediction quantity Weights ρ = P ( θ ) , (w) 0.3 rather than the parameter itself? 0.2 e.g. the average contaminant concentration 0.1 within a specific distance from the buildings’ walls; 0.0 Goal-Oriented ODE (GOODE) Improving model predictability Optimal Design of Experiments (ODE) [31/65] October 31, 2018: ANL; Ahmed Attia.

  34. GOODE ◮ Consider a linear prediction: ρ = P θ , where P is a linear prediction operator ◮ In the linear-Gaussian settings: ρ follows a Gaussian prior N ( ρ pr , Σ pr ) Σ pr = PΓ pr P ∗ ρ pr = P θ pr Improving model predictability Optimal Design of Experiments (ODE) [32/65] October 31, 2018: ANL; Ahmed Attia.

  35. GOODE ◮ Consider a linear prediction: ρ = P θ , where P is a linear prediction operator ◮ In the linear-Gaussian settings: ρ follows a Gaussian prior N ( ρ pr , Σ pr ) Σ pr = PΓ pr P ∗ ρ pr = P θ pr ◮ Given the observation y , and the design w , the posterior distribution of ρ is N ( ρ post , Σ post ) , with y ρ post = P θ post � � − 1 P ∗ Σ post = PΓ post P ∗ = PH − 1 P ∗ = P H misfit + Γ − 1 pr Improving model predictability Optimal Design of Experiments (ODE) [32/65] October 31, 2018: ANL; Ahmed Attia.

  36. GOODE ◮ Consider a linear prediction: ρ = P θ , where P is a linear prediction operator ◮ In the linear-Gaussian settings: ρ follows a Gaussian prior N ( ρ pr , Σ pr ) Σ pr = PΓ pr P ∗ ρ pr = P θ pr ◮ Given the observation y , and the design w , the posterior distribution of ρ is N ( ρ post , Σ post ) , with y ρ post = P θ post � � − 1 P ∗ Σ post = PΓ post P ∗ = PH − 1 P ∗ = P H misfit + Γ − 1 pr GOODE Objective: Find the design w that minimizes the uncertainty in ρ Improving model predictability Optimal Design of Experiments (ODE) [32/65] October 31, 2018: ANL; Ahmed Attia.

  37. GOODE: A-Optimality space-time formulation The G-O A-optimal design ( w GA opt ) GA w opt = arg min Tr( Σ post ( w )) + α � w � w ∈ R N s s.t. 0 ≤ w i ≤ 1 , i = 1 , . . . , N s Improving model predictability Optimal Design of Experiments (ODE) [33/65] October 31, 2018: ANL; Ahmed Attia.

  38. GOODE: A-Optimality space-time formulation The G-O A-optimal design ( w GA opt ) GA w opt = arg min Tr( Σ post ( w )) + α � w � w ∈ R N s s.t. 0 ≤ w i ≤ 1 , i = 1 , . . . , N s ◮ The gradient (discarding the regularization term) : N pred � ∇ w Tr( Σ post ( w )) = − ζ i ⊙ ζ i i =1 � � − 1 noise F [ H ( w )] − 1 P ∗ e i , and e i is the i th coordinate vector in R N pred 2 where ζ i = Γ Here, ⊙ is the pointwise Hadamard product Improving model predictability Optimal Design of Experiments (ODE) [33/65] October 31, 2018: ANL; Ahmed Attia.

  39. GOODE: A-Optimality 4D-Var formulation ◮ Efficient computation of the gradient: for temporally-uncorrelated observational noise, the gradient: N pred m � � ∇ w Tr( Σ post ( w )) = − ζ k,j ⊙ ζ k,j , k =1 j =1 where − 1 F 0 ,k [ H ( w )] − 1 P ∗ e i 2 ζ k,j = R k and * e i is the i th coordinate vector in R N pred * F 0 ,k is the forward operator that maps the parameter to the equivalent observation at time instance t k ; k = 1 , 2 , . . . , m Improving model predictability Optimal Design of Experiments (ODE) [34/65] October 31, 2018: ANL; Ahmed Attia.

  40. GOODE: D-Optimality 4D-Var formulation The G-O D-optimal design ( w GD opt ) GD log det ( Σ post ( w )) + α � w � w opt = arg min w ∈ R N s s.t. 0 ≤ w i ≤ 1 , i = 1 , . . . , N s ◮ The gradient (discarding the regularization term): N pred m � � ∇ w (log det ( Σ post ( w ))) = − ξ k,j ⊙ ξ k,j k =1 j =1 where F 0 ,k [ H ( w )] − 1 P ∗ Σ − 1 / 2 ξ k,j = R − 1 / 2 post ( w ) e j k and 1. e i is the i th coordinate vector in R N pred post ( w ) = Σ − 1 / 2 post ( w ) Σ − 1 / 2 2. Σ − 1 post ( w ) Improving model predictability Optimal Design of Experiments (ODE) [35/65] October 31, 2018: ANL; Ahmed Attia.

  41. GOODE: D-Optimality Alternative 4D-Var formulation ◮ Efficient computation of the gradient: for temporally-uncorrelated observational noise, the gradient is equivalent to: m N s � � � � T k,i Σ − 1 ∇ w (log det ( Σ post ( w ))) = − e i η post η k,i i =1 k =1 with η k,i = P [ H ( w )] − 1 F ∗ k, 0 R − 1 / 2 e i k where e i is the i th coordinate vector in R N s , i.e. in the observation space Improving model predictability Optimal Design of Experiments (ODE) [36/65] October 31, 2018: ANL; Ahmed Attia.

  42. B1 B2 GOODE Experiments using Advection-Diffusion Model: Setup I Domain, observational grid, and velocity field ◮ Numerical model (A-D): u solves: u t − κ ∆ u + v · ∇ u = 0 in Ω × [0 , T ] u (0 , x ) = u 0 in Ω κ ∇ u · n = 0 on ∂ Ω × [0 , T ] * Ω ∈ R 2 is an open and bounded domain * u the concentration of a contaminant in the domain Ω * κ is the diffusivity, and v is the velocity field ◮ Observations: N s = 22 candidate sensor locations, with * t 0 = 0 , and T = 0 . 8 * and observations are taken at time instances { t k } = { 0 . 4 , 0 . 6 , 0 . 8 } respectively GOODE Experiments are implemented in hIPPYlib - https://hippylib.github.io/ Improving model predictability Optimal Design of Experiments (ODE) [37/65] October 31, 2018: ANL; Ahmed Attia.

  43. GOODE Experiments using Advection-Diffusion Model: Setup II Predictions: P predicts u at the degrees of freedom of the FE discretization withing distance ǫ from one or both buildings at t pred . Vector-valued prediction Scalar-valued prediction u within distance ǫ from the internal boundaries the “average” u within distance ǫ from the inter- at time t pred nal boundaries at time t pred P s0 ≡ v T P v0 B1 P v0 B2 P v1 P s1 ≡ v T P v1 B1 & B2 P s2 ≡ v T P v2 P v2 The vector-valued operators, predict the value of u at the prediction grid-points, at prediction time. The � � T ∈ R N pred 1 1 scalar-valued operators average the vector-valued prediction QoI, i.e. v = N pred , . . . , N pred Here, we show A-GOODE results for: Prediction operator t pred ǫ N pred P v0 1 . 0 0 . 02 164 P v1 1 . 0 0 . 02 138 P v2 1 . 0 0 . 02 302 Regularization: ℓ 1 norm is used Attia, Ahmed, Alen Alexanderian, and Arvind K. Saibaba. ”Goal-Oriented Optimal Design of Experiments for Large-Scale Bayesian Linear Inverse Problems.” Inverse Problems, Vol . 34, Number 9, Pages 095009 (2018). Improving model predictability Optimal Design of Experiments (ODE) [38/65] October 31, 2018: ANL; Ahmed Attia.

  44. Numerical Results: P = P v ∗ ; A-GOODE Weights Weights Weights (w) (w) (w) 0.3 0.3 0.3 0.2 0.2 0.2 0.1 0.1 0.1 0.0 0.0 0.0 (c) P v 0 ; α = 10 − 4 (d) P v 1 ; α = 10 − 4 (e) P v 2 ; α = 10 − 4 (f) P v 0 ; α = 10 − 4 (g) P v 1 ; α = 10 − 4 (h) P v 2 ; α = 10 − 4 The optimal weights { w i } i =1 ,..., N s are plotted on the z-axis, where the weights are normalized to add up to 1 (top row); the corresponding active sensors are plotted on the bottom row. Improving model predictability Optimal Design of Experiments (ODE) [39/65] October 31, 2018: ANL; Ahmed Attia.

  45. Choosing the penalty parameter: P = P v ∗ ; A-GOODE 6.0e-02 0.150 6.0e-02 0.150 Near-optimal GOODE Near-optimal GOODE 0.125 0.125 5.0e-02 BruteForce 5.0e-02 BruteForce Optimal Objective Relative frequency Optimal Objective Relative frequency 0.100 0.100 4.0e-02 4.0e-02 0.075 0.075 3.0e-02 =5.0e-03 3.0e-02 =1.0e-03 0.050 0.050 2.0e-02 2.0e-02 0.025 0.025 1.0e-02 1.0e-02 0.000 0.000 0 5 10 15 20 0.00 0.05 0.10 0.15 0.20 0.25 0 5 10 15 20 0.00 0.05 0.10 0.15 0.20 0.25 Objective value Objective value Norm Norm 1 1 (a) P v 0 (b) P v 0 (c) P v 1 (d) P v 1 A-GOODE results with a sequence of 75 penalty parameter values spaced between [10 − 7 , 0 . 2] . 0.150 6.0e-02 Near-optimal GOODE =1.0e-02 0.125 5.0e-02 BruteForce Relative frequency Optimal Objective 0.100 4.0e-02 0.075 3.0e-02 0.050 2.0e-02 0.025 1.0e-02 0.000 0 5 10 15 20 0.00 0.05 0.10 0.15 0.20 0.25 Norm Objective value 1 (a) P v 2 (b) P v 2 Test with a prediction operator P v2 . Improving model predictability Optimal Design of Experiments (ODE) [40/65] October 31, 2018: ANL; Ahmed Attia.

  46. Outline Motivation Bayesian Inversion & Data Assimilation A Resampling Family for Non-Gaussian DA 1- HMC sampling filter 2- Cluster sampling filters 3- HMC sampling smoother 4- Reduced-order HMC smoother Optimal Design of Experiments (ODE) Bayesian inversion & sensor placement Goal-Oriented approach for ODE (GOODE) EnKF Inflation & Localization OED-based inflation & localization Concluding Remarks & Future Plans Improving model predictability EnKF Inflation & Localization [41/65] October 31, 2018: ANL; Ahmed Attia.

  47. Ensemble Kalman Filter (EnKF) Assimilation cycle over [ t k − 1 , t k ] ; Forecast step ◮ Initialize: an analysis ensemble { x a k − 1 ( e ) } e =1 ,..., N ens at t k − 1 Initial/Analysis Ensemble ~ & ' ( "#$ Model State Time ! "#$ Improving model predictability EnKF Inflation & Localization [42/65] October 31, 2018: ANL; Ahmed Attia.

  48. Ensemble Kalman Filter (EnKF) Assimilation cycle over [ t k − 1 , t k ] ; Forecast step ◮ Initialize: an analysis ensemble { x a k − 1 ( e ) } e =1 ,..., N ens at t k − 1 ◮ Forecast: use the discretized model M tk − 1 → tk to generate a forecast ensemble at t k : b a x k ( e ) = M tk − 1 → tk ( x k − 1 ( e )) + η k ( e ) , e = 1 , . . . , N ens Forecast/Background Ensemble ~ * + , # Forward Model Model ! " #$% ,→ " # State Time " #(% " # Improving model predictability EnKF Inflation & Localization [42/65] October 31, 2018: ANL; Ahmed Attia.

  49. Ensemble Kalman Filter (EnKF) Assimilation cycle over [ t k − 1 , t k ] ; Forecast step ◮ Initialize: an analysis ensemble { x a k − 1 ( e ) } e =1 ,..., N ens at t k − 1 ◮ Forecast: use the discretized model M tk − 1 → tk to generate a forecast ensemble at t k : b a x k ( e ) = M tk − 1 → tk ( x k − 1 ( e )) + η k ( e ) , e = 1 , . . . , N ens ◮ Forecast/Prior statistics : N ens 1 � b b x k = x k ( e ) N ens e =1 1 T ; � � b b b b b b b B k = N ens − 1 X X X k = [ x k (1) − x k , . . . , x k ( N ens ) − x k ] k k Forecast/Background Ensemble ~ * + , # Forward Model Model ! " #$% ,→ " # State Time " #(% " # Improving model predictability EnKF Inflation & Localization [42/65] October 31, 2018: ANL; Ahmed Attia.

  50. Ensemble Kalman Filter (EnKF) Assimilation cycle over [ t k − 1 , t k ] ; Analysis step ◮ Given an observation y k at time t k Observation: ) # ; Likelihood: * ) # |, # Observation & Likelihood Forward Model Model ! " #$% ,→ " # State Time " #(% " # Improving model predictability EnKF Inflation & Localization [43/65] October 31, 2018: ANL; Ahmed Attia.

  51. Ensemble Kalman Filter (EnKF) Assimilation cycle over [ t k − 1 , t k ] ; Analysis step ◮ Given an observation y k at time t k ◮ Analysis: sample the posterior (EnKF update) � � − 1 T T K k = B k H H k B k H k + R k k � � a b b k ( e ) = x k ( e ) + K k [ y k + ζ k ( e )] − H k ( x k ( e )) x Analysis Ensemble Corrected ~ * + , # Model State (Analysis) Observation & Likelihood Forward Model Model ! " #$% ,→ " # State Time " #(% " # Improving model predictability EnKF Inflation & Localization [43/65] October 31, 2018: ANL; Ahmed Attia.

  52. Ensemble Kalman Filter (EnKF) Assimilation cycle over [ t k − 1 , t k ] ; Analysis step ◮ Given an observation y k at time t k ◮ Analysis: sample the posterior (EnKF update) � � − 1 T T K k = B k H H k B k H k + R k k � � a b b k ( e ) = x k ( e ) + K k [ y k + ζ k ( e )] − H k ( x k ( e )) x ◮ The posterior (analysis) error covariance matrix : � � − 1 B − 1 k R − 1 H k T A k = ( I − K k H ) B k ≡ + H k Analysis Ensemble Corrected ~ * + , # Model State (Analysis) Observation & Likelihood Forward Model Model ! " #$% ,→ " # State Time " #(% " # Improving model predictability EnKF Inflation & Localization [43/65] October 31, 2018: ANL; Ahmed Attia.

  53. Ensemble Kalman Filter (EnKF) Sequential EnKF Issues ◮ Limited-size ensemble results in sampling errors, explained by: - variance underestimation - accumulation of long-range spurious correlations - filter divergence after a few assimilation cycles Sequentially Repeat the Assimilation Cycle Assimilation Cycle Analysis Ensemble Corrected ~ , - . # Model State (Analysis) Forward Observation Model & ! " # ,→ " #*% Likelihood Forward Model Model ! " #$% ,→ " # State Time " #(% " # " #)% Improving model predictability EnKF Inflation & Localization [44/65] October 31, 2018: ANL; Ahmed Attia.

  54. Ensemble Kalman Filter (EnKF) Sequential EnKF Issues ◮ Limited-size ensemble results in sampling errors, explained by: - variance underestimation - accumulation of long-range spurious correlations - filter divergence after a few assimilation cycles ◮ EnKF requires inflation & localization Sequentially Repeat the Assimilation Cycle Assimilation Cycle Analysis Ensemble Corrected ~ , - . # Model State (Analysis) Forward Observation Model & ! " # ,→ " #*% Likelihood Forward Model Model ! " #$% ,→ " # State Time " #(% " # " #)% Improving model predictability EnKF Inflation & Localization [44/65] October 31, 2018: ANL; Ahmed Attia.

  55. EnKF: Inflation Space-independent inflation: � √ √ b �� � b � � ; 0 < λ l ≤ λ ≤ λ u � X b = b (1) − x b ( N ens ) − x λ , . . . , λ x x 1 � � � T = λ B � N ens − 1 � B = X b X b Space-dependent inflation: Let D := diag ( λ ) ≡ � N state λ i e i e T i , i =1 1 b , � X b = D 2 X T = D 1 � � � � N ens − 1 � 1 1 2 . B = X b X b 2 BD Improving model predictability EnKF Inflation & Localization [45/65] October 31, 2018: ANL; Ahmed Attia.

  56. EnKF: Inflation Space-independent inflation: � √ √ b �� � b � � ; 0 < λ l ≤ λ ≤ λ u X b = � b (1) − x b ( N ens ) − x λ , . . . , λ x x 1 � � � T = λ B � N ens − 1 � B = X b X b Space-dependent inflation: Let D := diag ( λ ) ≡ � N state λ i e i e T i , i =1 1 b , � X b = D 2 X T = D 1 � � � � N ens − 1 � 1 1 2 . B = X b X b 2 BD The inflated Kalman gain � K , and analysis error covariance matrix � A � � T � � − 1 ; � � � � − 1 K = � � H � T + R � I − � B − 1 + H T R − 1 H A = B ≡ BH BH KH Improving model predictability EnKF Inflation & Localization [45/65] October 31, 2018: ANL; Ahmed Attia.

  57. EnKF: Schur-Product Localization State-space formulation; B − Localization Covariance localization: � B := C ⊙ B ; s.t. C = [ ρ i,j ] i,j =1 , 2 ,..., N state Entries of C are created using space-dependent localization functions † : → Gauss: � − d ( i, j ) 2 � i, j = 1 , 2 , . . . , N state , ρ i,j ( L ) = exp ; 2 L 2 → 5th-order Gaspari-Cohn: � 3 − 5 � � 5 + 1 � � 4 + 5 � � � 2 + 1 ,  d ( i,j ) d ( i,j ) d ( i,j ) d ( i,j ) − 1 0 ≤ d ( i, j ) ≤ L   4 L 2 L 8 L 3 L  � � 5 − 1 � � 4 + 5 � � 3 + 5 � � 2 − 5 � � � � ρ i,j ( L ) = d ( i,j ) d ( i,j ) d ( i,j ) d ( i,j ) d ( i,j ) L 1 + 4 − 2 , L ≤ d ( i, j ) ≤ 2 L 12 L 2 L 8 L 3 L L 3 d ( i,j )    0 . 2 L ≤ d ( i, j ) † - d ( i, j ) : distance between i th and j th grid points - L ≡ L ( i, j ) : radius of influence, i.e. localization radius, for i th and j th grid points Improving model predictability EnKF Inflation & Localization [46/65] October 31, 2018: ANL; Ahmed Attia.

  58. EnKF: Schur-Product Localization Observation-space formulation; R − Localization ◮ Localization in observation space ( R − localization): ◮ HB is replaced with ” HB = C loc , 1 ⊙ HB , where � � C loc , 1 = ρ o | m ; i = 1 , 2 , . . . N obs ; j = 1 , 2 , . . . N state i,j � ◮ HBH T can be replaced with HBH T = C loc , 2 ⊙ HBH T , where � � C loc , 2 ≡ C o | o = ρ o | o ; i, j = 1 , 2 , . . . N obs i,j - ρ o | m is calculated between the i th observation grid point and the j th model grid point. i,j - ρ o | o i,j is calculated between the i th and j th observation grid points. Improving model predictability EnKF Inflation & Localization [47/65] October 31, 2018: ANL; Ahmed Attia.

  59. EnKF: Schur-Product Localization Observation-space formulation; R − Localization ◮ Localization in observation space ( R − localization): ◮ HB is replaced with ” HB = C loc , 1 ⊙ HB , where � � C loc , 1 = ρ o | m ; i = 1 , 2 , . . . N obs ; j = 1 , 2 , . . . N state i,j � ◮ HBH T can be replaced with HBH T = C loc , 2 ⊙ HBH T , where � � C loc , 2 ≡ C o | o = ρ o | o ; i, j = 1 , 2 , . . . N obs i,j - ρ o | m is calculated between the i th observation grid point and the j th model grid point. i,j - ρ o | o i,j is calculated between the i th and j th observation grid points. ◮ Assign radii to state grid points vs. observation grid points: - Let L ∈ R N obs to model grid points, and project to observations for C loc , 2 [hard/unknown] - Let L ∈ R N obs to observation grid points; [efficient; followed here] Improving model predictability EnKF Inflation & Localization [47/65] October 31, 2018: ANL; Ahmed Attia.

  60. EnKF: Schur-Product Localization Observation-space formulation; R − Localization ◮ Localization in observation space ( R − localization): ◮ HB is replaced with ” HB = C loc , 1 ⊙ HB , where � � C loc , 1 = ρ o | m ; i = 1 , 2 , . . . N obs ; j = 1 , 2 , . . . N state i,j � ◮ HBH T can be replaced with HBH T = C loc , 2 ⊙ HBH T , where � � C loc , 2 ≡ C o | o = ρ o | o ; i, j = 1 , 2 , . . . N obs i,j - ρ o | m is calculated between the i th observation grid point and the j th model grid point. i,j - ρ o | o i,j is calculated between the i th and j th observation grid points. ◮ Assign radii to state grid points vs. observation grid points: - Let L ∈ R N obs to model grid points, and project to observations for C loc , 2 [hard/unknown] - Let L ∈ R N obs to observation grid points; [efficient; followed here] The parameters λ ∈ R N state , L ∈ ( R Nstate or R Nobs ) , are generally tuned empirically! Improving model predictability EnKF Inflation & Localization [47/65] October 31, 2018: ANL; Ahmed Attia.

  61. EnKF: Schur-Product Localization Observation-space formulation; R − Localization ◮ Localization in observation space ( R − localization): ◮ HB is replaced with ” HB = C loc , 1 ⊙ HB , where � � C loc , 1 = ρ o | m ; i = 1 , 2 , . . . N obs ; j = 1 , 2 , . . . N state i,j � ◮ HBH T can be replaced with HBH T = C loc , 2 ⊙ HBH T , where � � C loc , 2 ≡ C o | o = ρ o | o ; i, j = 1 , 2 , . . . N obs i,j - ρ o | m is calculated between the i th observation grid point and the j th model grid point. i,j - ρ o | o i,j is calculated between the i th and j th observation grid points. ◮ Assign radii to state grid points vs. observation grid points: - Let L ∈ R N obs to model grid points, and project to observations for C loc , 2 [hard/unknown] - Let L ∈ R N obs to observation grid points; [efficient; followed here] The parameters λ ∈ R N state , L ∈ ( R Nstate or R Nobs ) , are generally tuned empirically! We proposed an OED approach to automatically tune/ these parameters. Improving model predictability EnKF Inflation & Localization [47/65] October 31, 2018: ANL; Ahmed Attia.

  62. OED Approach for Adaptive Inflation The A-optimal design (inflation parameter, λ A − opt ) minimizes: � � � − α � λ − 1 � 1 λ ∈ R N state Tr min A ( λ ) 1 = λ l i ≤ λ i ≤ λ u subject to i , i = 1 , . . . , N state Improving model predictability EnKF Inflation & Localization [48/65] October 31, 2018: ANL; Ahmed Attia.

  63. OED Approach for Adaptive Inflation The A-optimal design (inflation parameter, λ A − opt ) minimizes: � � � − α � λ − 1 � 1 λ ∈ R N state Tr min A ( λ ) 1 = λ l i ≤ λ i ≤ λ u subject to i , i = 1 , . . . , N state Remark: we choose the sign of the regularization term to be negative, unlike the traditional formulation ◮ Let H = H = I with uncorrelated observation noise, the design criterion becomes: N state � � � � � − 1 � Ψ Infl ( λ ) := Tr λ − 1 σ − 2 + r − 2 A = i i i i =1 ◮ Decreasing λ i reduces Ψ Infl , i.e. the optimizer will always move toward λ l Improving model predictability EnKF Inflation & Localization [48/65] October 31, 2018: ANL; Ahmed Attia.

  64. OED Approach for Adaptive Inflation Solving the A-OED problem, requires evaluating the objective, and the gradient: ◮ The design criterion: �� � � � � � � � T � − 1 H � R + H � B � Infl ( λ ) := Tr T Ψ A = Tr B − Tr BH BH ◮ The gradient: N state � λ − 1 Infl ( λ ) = T ∇ λ Ψ e i e i ( z 1 − z 2 − z 3 + z 4) i i =1 z 1 = � Be i T � T � − 1 H � R + H � z 2 = H BH B z 1 T � T � − 1 H z 1 z 3 = � R + H � BH BH T � − 1 H � T � R + H � z 4 = H BH B z 3 N state is the ith cardinality vector e i ∈ R Improving model predictability EnKF Inflation & Localization [49/65] October 31, 2018: ANL; Ahmed Attia.

  65. OED Adaptive B − Localization (State-Space) � � � B − Loc ( L ) + γ Φ( L ) := Tr L ∈ R N state Ψ min A ( L ) + γ � L � 2 l l i ≤ l i ≤ l u subject to i , i = 1 , . . . , N state ◮ The design criterion: �� � � � � T � − 1 H � Ψ B − Loc ( L ) = Tr R + H � B � T − Tr B BH BH ◮ The gradient : N state � � − 1 � � − 1 e i � ∇ L Ψ B − Loc = T R − 1 H � I + � T R − 1 H I + H e i l B,i B BH i =1 � � T T l B,i = l i ⊙ e i B � ∂ρ i, 1 ( l i ) � T , ∂ρ i, 2 ( l i ) , . . . , ∂ρ i, N state ( l i ) l i = ∂l i ∂l i ∂l i N state is the ith cardinality vector e i ∈ R Improving model predictability EnKF Inflation & Localization [50/65] October 31, 2018: ANL; Ahmed Attia.

  66. OED Adaptive: Observation-Space Localization ◮ Assume L ∈ R N obs is attached to observation grid points ◮ HB is replaced with ” HB = C loc , 1 ⊙ HB , with � � loc , 1 = ρ o | m i,j ( l i ) ; i = 1 , 2 , . . . N obs ; j = 1 , 2 , . . . N state C T can be replaced with ’ T = C loc , 2 ⊙ HBH ◮ HBH T , with HBH C o | o := 1 = 1 � � � � C o r + C o ρ o | o i,j ( l i ) + ρ o | o i,j ( l j ) c 2 2 i,j =1 , 2 ,..., N state Improving model predictability EnKF Inflation & Localization [51/65] October 31, 2018: ANL; Ahmed Attia.

  67. OED Adaptive: Observation-Space Localization ◮ Assume L ∈ R N obs is attached to observation grid points ◮ HB is replaced with ” HB = C loc , 1 ⊙ HB , with � � loc , 1 = ρ o | m i,j ( l i ) ; i = 1 , 2 , . . . N obs ; j = 1 , 2 , . . . N state C T can be replaced with ’ T = C loc , 2 ⊙ HBH ◮ HBH T , with HBH C o | o := 1 = 1 � � � � C o r + C o ρ o | o i,j ( l i ) + ρ o | o i,j ( l j ) c 2 2 i,j =1 , 2 ,..., N state ◮ Localized posterior covariances: ◮ Localize HB : T � R + HBH T � − 1 ” A = B − ” � HB HB ◮ Localize both HB and HBH T : � − 1 ” T � R + ’ � A = B − ” HBH T HB HB Improving model predictability EnKF Inflation & Localization [51/65] October 31, 2018: ANL; Ahmed Attia.

  68. OED Adaptive R − Localization Decorrelate HB ◮ The design criterion: � ” T � − 1 � HB ” T � Ψ R − Loc ( L ) = Tr ( B ) − Tr N obs R + HBH ; L ∈ R HB ◮ The gradient : N obs � ∇ L Ψ R − Loc = − 2 T HB , i ψ i e i l i =1 ψ i = ” T � T � − 1 e i R + HBH HB T ⊙ � � � � l s T l HB , i = e i HB i � ∂ρ i, 1 ( l i ) � T , ∂ρ i, 2 ( l i ) , . . . , ∂ρ i, N state ( l i ) l s i = ∂l i ∂l i ∂l i N obs is the ith cardinality vector e i ∈ R Improving model predictability EnKF Inflation & Localization [52/65] October 31, 2018: ANL; Ahmed Attia.

  69. OED Adaptive R − Localization Decorrelate HB and HBH T ◮ The design criterion : � � − 1 � T � HB ” ” � Ψ R − Loc ( L ) = Tr ( B ) − Tr T N obs HB R + HBH ; L ∈ R ◮ The gradient : N obs � � � ∇ L Ψ R − Loc = η o ψ o T e i i − 2 l HB , i i i =1 T � � − 1 i = ” � ψ o T HB R + HBH e i � � − 1 ” � η o i = l o T R + HBH HB B,i T ⊙ � � � T � l o l o T B,i = e i HBH i � ∂ρ i, 1 ( l i ) � , ∂ρ i, 2 ( l i ) , . . . , ∂ρ i, N obs ( l i ) T l o i = ∂l i ∂l i ∂l i Improving model predictability EnKF Inflation & Localization [53/65] October 31, 2018: ANL; Ahmed Attia.

  70. Experimental Setup ◮ The model (Lorenz-96): dx i = x i − 1 ( x i +1 − x i − 2 ) − x i + F ; i = 1 , 2 , . . . , 40 , dt ◦ x ∈ R 40 is the state vector, with x 0 ≡ x 40 ◦ F = 8 ◮ Initial background ensemble & uncertainty: ◦ reference IC: x True = M t =0 → t =5 ( − 2 , . . . , 2) T 0 � � ◦ B 0 = σ 0 I ∈ R N state × N state , with σ 0 = 0 . 08 � � x True � � 0 2 ◮ Observations: ◦ σ obs = 5% of the average magnitude of the observed reference trajectory ◦ R = σ obs I ∈ R N obs × N obs ◦ Synthetic observations are generated every 20 time steps, with H ( x ) = Hx = ( x 1 , x 3 , x 5 , . . . , x 37 , x 39 ) T ∈ R 20 . ◮ EnKF flavor used here: DEnKF with Gaspari-Cohn (GC) localization Experiments are implemented in DATeS - http://people.cs.vt.edu/~attia/DATeS/ - Ahmed Attia and Adrian Sandu, DATeS: A Highly-Extensible Data Assimilation Testing Suite, Geosci. Model Dev. Discuss., https://doi.org/10.5194/gmd-2018-30, in review, 2018. Improving model predictability EnKF Inflation & Localization [54/65] October 31, 2018: ANL; Ahmed Attia.

  71. Numerical Results: Benchmark RMSE KL-Distance to U 1.24 1.21 0 . 05 Inflation Factor 2 × 10 − 1 Forecast 1.18 Relative Frequency OED-DEnKF 0 . 04 1.15 log-RMSE 1.12 0 . 03 10 − 1 1.09 1.06 0 . 02 1.03 6 × 10 − 2 0 . 01 1.0 4 × 10 − 2 5 10 15 20 25 30 35 40 0 . 00 100 130 160 190 220 250 280 0 10 20 Ensemble Size Time (assimilation cycles) Rank (a) RMSE (b) Rank histogram The minimum average RMSE over the interval [10 , 30] , for every Analysis RMSE and rank histogram of DEnKF with choice of N ens , is indicated by red L = 4 , and λ = 1 . 05 . a triangle. Blue tripods indicate the minimum KL distance between the analysis rank histogram and a uniformly distributed rank Benchmark EnKF Results histogram. Space-independent radius of influence L = 4 is used. Improving model predictability EnKF Inflation & Localization [55/65] October 31, 2018: ANL; Ahmed Attia.

  72. Numerical Results: A-OED Adaptive Space-Time Inflation I 0 . 05 Optimal DEnKF Relative Frequency OED-DEnKF 0 . 04 10 − 1 log-RMSE 0 . 03 6 × 10 − 2 0 . 02 4 × 10 − 2 0 . 01 3 × 10 − 2 0 . 00 100 130 160 190 220 250 280 0 10 20 Time (assimilation cycles) Rank (a) RMSE; α = 0 . 14 (b) Rank histogram; α = 0 . 14 2 × 10 − 1 0 . 05 Optimal DEnKF Relative Frequency OED-DEnKF 0 . 04 log-RMSE 10 − 1 0 . 03 6 × 10 − 2 0 . 02 4 × 10 − 2 0 . 01 3 × 10 − 2 0 . 00 100 130 160 190 220 250 280 0 10 20 Time (assimilation cycles) Rank (c) RMSE; α = 0 . 04 (d) Rank histogram; α = 0 . 04 The localization radius is fixed to L = 4 . The optimization penalty parameter α is indicated under each panel. Improving model predictability EnKF Inflation & Localization [56/65] October 31, 2018: ANL; Ahmed Attia.

  73. Numerical Results: A-OED Adaptive Space-Time Inflation II 1 . 020 1 . 10 Inflation factors λ i Inflation factors λ i 1 . 018 1 . 09 1 . 016 1 . 08 1 . 014 1 . 07 1 . 012 1 . 06 100 140 180 220 260 300 100 140 180 220 260 300 Time (assimilation cycles) Time (assimilation cycles) (a) α = 0 . 14 (b) α = 0 . 04 Box plots expressing the range of values of the inflation coefficients at each time instant, over the testing timespan [10 , 30] . Improving model predictability EnKF Inflation & Localization [57/65] October 31, 2018: ANL; Ahmed Attia.

  74. Numerical Results; A-OED Inflation Regularization I Choosing α α = 0 . 1400 , RMSE = 0 . 0547 α = 0 . 0900 , RMSE = 0 . 0555 α = 0 . 1200 , RMSE = 0 . 0548 α = 0 . 1300 , RMSE = 0 . 0558 α = 0 . 2200 , RMSE = 0 . 0552 300 0 . 18 290 0 . 16 0 . 14 280 Time 0 . 12 270 0 . 10 0 . 08 260 0 . 06 250 6 . 4 6 . 6 6 . 8 7 . 0 7 . 2 7 . 4 � λ � 1 L-curve plots are are plotted for 25 equidistant values of the penalty parameter, at every assimilation time instant over the testing timespan [0 . 03 , 0 . 24] . The values of the penalty parameter α that resulted in the 5 smallest average RMSEs, over all experiments carried out with different penalties, are highlighted on the plot and indicated in the legend along with the corresponding average RMSE. Improving model predictability EnKF Inflation & Localization [58/65] October 31, 2018: ANL; Ahmed Attia.

  75. Numerical Results; A-OED Inflation Regularization II Choosing α α = 0 . 1400 , RMSE = 0 . 0547 α = 0 . 1400 , RMSE = 0 . 0547 α = 0 . 1200 , RMSE = 0 . 0548 α = 0 . 1200 , RMSE = 0 . 0548 0 . 20 0 . 20 α = 0 . 2200 , RMSE = 0 . 0552 α = 0 . 2200 , RMSE = 0 . 0552 α = 0 . 0900 , RMSE = 0 . 0555 α = 0 . 0900 , RMSE = 0 . 0555 α = 0 . 1300 , RMSE = 0 . 0558 α = 0 . 1300 , RMSE = 0 . 0558 Ψ Infl ( λ ) 0 . 15 Ψ Infl ( λ ) 0 . 15 0 . 10 0 . 10 0 . 05 0 . 05 6 . 50 6 . 75 7 . 00 7 . 25 7 . 50 6 . 50 6 . 75 7 . 00 7 . 25 7 . 50 � λ � 1 � λ � 1 (a) Cycle 100 (b) Cycle 150 L-curve plots are are plotted for 25 equidistant values of the penalty parameter at assimilation cycles 100 and 150 , respectively. Improving model predictability EnKF Inflation & Localization [59/65] October 31, 2018: ANL; Ahmed Attia.

  76. Numerical Results: A-OED Adaptive Space-Time Localization I State-space formulation 0 . 06 Optimal DEnKF Relative Frequency OED-DEnKF 10 − 1 0 . 04 log-RMSE 6 × 10 − 2 0 . 02 4 × 10 − 2 3 × 10 − 2 0 . 00 100 130 160 190 220 250 280 0 10 20 Time (assimilation cycles) Rank (a) RMSE; γ = 0 (b) Rank histogram; γ = 0 0 . 06 Optimal DEnKF Relative Frequency OED-DEnKF 10 − 1 0 . 04 log-RMSE 6 × 10 − 2 0 . 02 4 × 10 − 2 3 × 10 − 2 0 . 00 100 130 160 190 220 250 280 0 10 20 Time (assimilation cycles) Rank (c) RMSE; γ = 0 . 001 (d) Rank histogram; γ = 0 . 001 The inflation factor is fixed to λ = 1 . 05 . The optimization penalty parameter γ is shown under each panel. Improving model predictability EnKF Inflation & Localization [60/65] October 31, 2018: ANL; Ahmed Attia.

  77. Numerical Results: A-OED Adaptive Space-Time Localization II State-space formulation 0 . 05 Relative Frequency 0 . 04 log-RMSE 10 − 1 0 . 03 0 . 02 6 × 10 − 2 Optimal DEnKF 0 . 01 OED-DEnKF 4 × 10 − 2 0 . 00 100 130 160 190 220 250 280 0 10 20 Time (assimilation cycles) Rank (a) RMSE (b) Rank histogram Results for λ = 1 . 05 , and γ = 0 . 04 . 0 0 0 1 17 17 10 10 10 State variables State variables State variables 13 13 20 20 20 9 9 30 30 30 5 5 100 140 180 220 260 300 100 140 180 220 260 300 100 140 180 220 260 300 Time (assimilation cycles) Time (assimilation cycles) Time (assimilation cycles) (a) γ = 0 . 0 (b) γ = 0 . 001 (c) γ = 0 . 04 Localization radii at each time points, over the testing timespan [10 , 30] . The optimization penalty parameter γ is shown under each panel. Improving model predictability EnKF Inflation & Localization [61/65] October 31, 2018: ANL; Ahmed Attia.

  78. Numerical Results: A-OED Adaptive Space-Time Localization Choosing γ γ = 0 . 0000 , RMSE = 0 . 0569 γ = 0 . 0030 , RMSE = 0 . 0685 γ = 0 . 0010 , RMSE = 0 . 0613 γ = 0 . 0040 , RMSE = 0 . 0743 γ = 0 . 0020 , RMSE = 0 . 0630 300 0 . 55 290 0 . 50 0 . 45 280 0 . 40 Time 0 . 35 270 0 . 30 0 . 25 260 0 . 20 0 . 15 250 10 20 30 40 50 60 � L � 2 L-curve plots are shown for values of the penalty parameter γ = 0 , 0 . 001 , . . . , 0 . 34 . Improving model predictability EnKF Inflation & Localization [62/65] October 31, 2018: ANL; Ahmed Attia.

  79. Numerical Results: A-OED Adaptive Space-Time Localization I Observation-space formulation localize ( HB , HBH T ) localize B localize HB 10 − 1 log-RMSE 6 × 10 − 2 4 × 10 − 2 3 × 10 − 2 100 150 200 250 300 Time (Assimilation cycles) A-OED optimal localization radii L found by solving the OED localization problems in model state-space, and observation space respectively. No regularization is applied, i.e., γ = 0 Improving model predictability EnKF Inflation & Localization [63/65] October 31, 2018: ANL; Ahmed Attia.

Recommend


More recommend