Data-driven stochastic model reduction Fei Lu joint with Alexandre J. Chorin and Kevin Lin UC Berkeley Lawrence Berkeley National Laboratory U of Arizona July 18, 2016 ICERM, Brown University 1 / 22
Outline Part I: Motivation for stochastic model reduction Part II: A discrete-time approach Part III: Examples 2 / 22
Motivation Climate modeling, numerical weather prediction: High-dimensional Discrete partial Prediction Full system data d Forecast dt x = f ( x ) + U ( x , y ) , Observe only { x ( n δ ) } N n = 1 . x ( t ) , t ≥ N δ . d dt y = g ( x , y ) . Possible approaches: Estimate y , and solve the full system ◮ The full system is expensive to solve ◮ The full system may be unknown Construct a reduced model for x 3 / 22
Why stochastic models Data { x ( n δ ) } ⇒ properties of the full system: long-term statistics (ergodicity) N � 1 F ( x ( n i δ )) → E [ F ( x )] N i = 1 positive Lyapunov exponents the effects of y act like random forces Goal: develop a stochastic reduced system for x that can reproduce long-term statistics; make medium range predictions. 4 / 22
Outline Part I: Motivation for stochastic model reduction Part II: A discrete-time approach Previous work: continuous-time approach Discrete-time stochastic parametrization Part III: Examples 5 / 22
The Mori-Zwanzig Formalism ( Mori (65) and Zwanzig (73) ) Rewrite the deterministic system in a form which resembles a generalized Langevin equation ( Chorin-Hald (06) ): Z t dX t dt = f ( X t ) + Γ( t − s , X s ) ds + W t 0 Markovian memory noise Chorin et al (00), Zwanzig(01), Chorin-Hald(06) equation + a projection operator + Duhamel’ Message: memory exists in a closed system for x . Multi-level regression Atmospheric sciences: Kondrashov et al (05-15), Wilks (05) Hypoelliptic SDEs ( Majda-Harlim (13,14) ) dX t = f ( X t ) dt + V t dt , dV t = a ( X t , V t ) dt + σ dW t . 6 / 22
Continuous-time approach: Discrete partial data dX t = f ( X t ) dt + V t dt , parameter estimation structure dV t = a ( X t , V t ) dt + σ dW t . Prediction derivation A continuous-time A discrete-time stochastic system stochastic system discretization 7 / 22
Continuous-time approach: Discrete partial data dX t = f ( X t ) dt + V t dt , parameter estimation structure dV t = a ( X t , V t ) dt + σ dW t . Prediction derivation A continuous-time A discrete-time stochastic system stochastic system discretization Discrete-time approach: Discrete partial data X n = Φ( X n − p : n − 1 , ξ n − q : n − 1 ) + ξ n parameter structure estimation derivation Prediction A discrete-time stochastic system 8 / 22
Discrete-time stochastic parametrization Identify a discrete-time stochastic system: NARMA ( Chorin-Lu (15) ) X n = X n − 1 + δ R δ ( X n − 1 ) + δ Z n , Z n = Φ n + ξ n , p q r s � � � � Φ n = a j Z n − j + b i , j P i ( X n − j ) + c j ξ n − j ; j = 1 j = 1 i = 1 j = 1 R δ ( X n − 1 ) from a discrete scheme for x ′ ≈ f ( x ) ; Φ n = Φ( Z n − p : n − 1 , X n − r : n − 1 , ξ n − q : n − 1 ) depends on the past; ξ n are i.i.d N ( 0 , σ 2 ) ; a Nonlinear AutoRegression Moving Average model (NARMA). Structure: terms in Φ n ; Parameters: a j , b j , c j , and σ . 9 / 22
X n = X n − 1 + δ R δ ( X n − 1 ) + δ Z n , Z n = Φ n + ξ n , p r s q � � � � Φ n = a j Z n − j + b i , j P i ( X n − j ) + c j ξ n − j ; j = 1 j = 1 i = 1 j = 1 Parameter estimation: conditional maximum likelihood estimator (MLE) Conditional on ξ 1 , . . . , ξ q , the log-likelihood of X q + 1 : N is l ( θ | ξ 1 , . . . , ξ q ) = − � N | Z n − Φ n | 2 − N − q ln σ 2 . n = q + 1 2 σ 2 2 when q = 0, MLE = least squares estimator (LSE) when q > 0, set ξ 1 = · · · = ξ q = 0 ◮ ergodicity ⇒ MLE is consistent ⇒ independent of ξ 1 , . . . , ξ q Structure derivation/selection: stability, long-term statistics numerical schemes analytical properties of the full system 10 / 22
Outline Part I: Motivation for stochastic model reduction Part II: A discrete-time approach Part III: Examples: The Lorenz 96 system The Kuramoto-Sivashinsky equation 11 / 22
Example I: the Lorenz 96 system � dt x k = x k − 1 ( x k + 1 − x k − 2 ) − x k + 10 − 1 d y j , k , J j dt y j , k = 1 d ε [ y j + 1 , k ( y j − 1 , k − y j + 2 , k ) − y j , k + x k ] , where x ∈ R K , y ∈ R JK . a chaotic system K = 18 , J = 20: y ∈ R 360 Find a reduced system for x ∈ R 18 based on ➣ Data { x ( n δ ) } N n = 1 ➣ d dt x k ≈ x k − 1 ( x k + 1 − x k − 2 ) − x k + 10. Wilks, 2005 12 / 22
� NARMA [Chorin-Lu (15)] : x n − 1 + δ R δ ( x n − 1 ) + δ z n ; z n = Φ n + ξ n , x n = p r d x � � � a j z n − j + b j , l ( x n − j ) l Φ n = µ + j = 1 j = 1 l = 1 d R q s � � � c j , l ( R δ ( x n − j )) l + d j ξ n − j . + j = 1 l = 1 j = 1 � Polynomial autoregression (POLYAR) [Wilks (05)]: d = x k − 1 ( x k + 1 − x k − 2 ) − x k + 10 + U , dt x k U = P ( x k ) + η k , with η k ( t + δ ) = φη k ( t ) + ξ ( t ) where P ( x ) = � d j = 0 a j x j ; ξ ( t ) ∼ N ( 0 , σ 2 ) . 13 / 22
Long-term statistics run the reduced systems for a long time; compare with the data: 0.14 1 L96 L96 POLYAR POLYAR NARMAX NARMAX 0.8 0.12 0.6 0.1 0.4 0.08 pdf ACF 0.2 0.06 0 0.04 − 0.2 0.02 − 0.4 0 − 0.6 − 10 − 5 0 5 10 15 0 1 2 3 4 5 6 7 8 9 10 x 1 time empirical probability density function autocorrelation function 14 / 22
Short-term prediction A typical ensemble forecast: RMSE of many forecasts: 20 10 POLYAR 18 5 X 1 16 POLYAR 0 14 − 5 12 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 time RMSE 10 NARMAX 8 10 NARMAX 6 5 X 1 4 0 2 − 5 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 lead time time Forecast time: forecast trajectories in cyan POLYAR: t ≈ 1 true trajectory in blue NARMA: t ≈ 2 . 5 15 / 22
Example II: the Kuramoto-Sivashinsky equation The KSE with periodic BC: v ( x , t ) = v ( x + 2 πν, t ) ∂ t + ∂ 2 v ∂ x 2 + ∂ 4 v ∂ v ∂ x 4 + v ∂ v ∂ x = 0 , x ∈ R , t > 0 . spatio-temporally chaotic: Fourier: v ( x , t ) = � + ∞ k = −∞ ˆ v k e iq k x , (here q k = k ν ) ∞ d v k − iq k v k = ( q 2 k − q 4 � dt ˆ k )ˆ ˆ v l ˆ v k − l . 2 l = −∞ Problem setting: ν = 3 . 43 � observe only K = 5 modes, (ˆ v k , k = 1 , . . . , K ) � predict their evolution 16 / 22
The truncated system is not accurate: d v k − iq k v k = ( q 2 k − q 4 � dt ˆ k )ˆ ˆ v l ˆ | k | ≤ K . v k − l , 2 | l |≤ K , | k − l |≤ K NARMA compute R δ ( X n − 1 ) ( from the K -mode X n = X n − 1 + δ R δ ( X n − 1 ) + δ Z n , truncated system) Z n = Φ n + ξ n , structure for Φ n : p q � � a j Z n − j + c j ξ n − j Φ n = ◮ different modes have different j = 1 j = 1 dynamics r s � � b i , j P i ( X n − j ) . + ◮ nonlinear interaction between j = 1 i = 1 modes 17 / 22
Large time behavior of the KSE ( Constantin, Jolly, Kevrekidis, Titi et al (88-94) ) Inertial manifold M Let v = u + w . Rewrite the KSE: if M is the graph of a function du dt = Au + Pf ( u + w ) w = ψ ( u ) , du dw dt = Au + Pf ( u + ψ ( u )) . dt = Aw + Qf ( u + w ) Approximate inertial manifolds (AIMs) Approximate the function w = ψ ( u ) : dw dt ≈ 0 ⇒ w ≈ A − 1 Qf ( u + w ) , Fixed point: ψ 0 = 0; ψ n + 1 = A − 1 Qf ( u + ψ n ) . An accurate AIM requires m = dim ( u ) to be large! ( distance to the attractor ∼ | q m | − γ ) 18 / 22
NARMA with AIMs ( Lu-Lin-Chorin (15) ) The AIMs hint at how the high modes depend on the low modes: � � � A − 1 Qf ( u ) ˆ k ⇒ ˆ v l ˆ ˆ | k | > K : v k ≈ ψ 1 , k = v k ≈ c k v k − l . 1 ≤| l | , | k − l |≤ K Important: keep the terms, and estimate the coefficients � u n 1 ≤ j ≤ K ; j , u n � i � K j = l = j − K u n l u n j − l , K < j ≤ 2 K . A discrete-time stochastic system: u n k = u n − 1 + δ R δ k ( u n − 1 ) + δ z n k , k z n k = Φ n k + ξ n k , p r K � � � Φ n a k , j z n − j b k , j u n − j u n u n c k , j � j + K � k = µ k + + + j + K − k k k j = 1 j = 0 j = 1 q � k ( u n ) + d k , j ξ n − j + c k , ( K + 1 ) R δ . k j = 1 19 / 22
Long-term statistics: data data truncated system truncated system 0.8 NARMAX NARMAX 0 10 0.6 ACF pdf 0.4 0.2 − 2 10 0 − 0.2 − 0.4 − 0.2 0 0.2 0.4 0.6 0 10 20 30 40 50 v x 4 time lead time auto-correlation function probability density function 20 / 22
Short-term prediction A typical forecast: RMSE of many forecasts: 0.5 the truncated system 15 v 4 0 the truncated system − 0.5 RMSE 10 20 40 60 80 ensemble size = 1 0.4 NARMA 0.2 5 NARMA 0 v 4 − 0.2 ensemble size = 20 − 0.4 0 20 40 60 80 20 40 60 80 lead time time time t Forecast time: the truncated system: t ≈ 15 the NARMA system: t ≈ 50 21 / 22
Summary Data-driven stochastic model reduction: High-dimensional Full system Discrete-time stochastic Prediction reduced system Discrete partial data X n = Φ( X n − p : n − 1 , ξ n − q : n − 1 ) + Ψ( X n − p : n − 1 , ξ n − q : n − 1 ) ξ n integrate numerical scheme into statistical inference easy parameter estimation flexible structure selection Thanks for your attention! 22 / 22
Recommend
More recommend