time series modelling and kalman filters
play

Time Series Modelling and Kalman Filters Chris Williams School of - PowerPoint PPT Presentation

Time Series Modelling and Kalman Filters Chris Williams School of Informatics, University of Edinburgh November 2010 1 / 24 Outline Stochastic processes AR, MA and ARMA models The Fourier view Parameter estimation for ARMA


  1. Time Series Modelling and Kalman Filters Chris Williams School of Informatics, University of Edinburgh November 2010 1 / 24

  2. Outline ◮ Stochastic processes ◮ AR, MA and ARMA models ◮ The Fourier view ◮ Parameter estimation for ARMA models ◮ Linear-Gaussian HMMs (Kalman filtering) ◮ Reading: Handout on Time Series Modelling: AR, MA, ARMA and All That ◮ Reading: Bishop 13.3 (but not 13.3.2, 13.3.3) 2 / 24

  3. Example time series ◮ FTSE 100 ◮ Meteorology: temperature, pressure ... ◮ Seismology ◮ Electrocardiogram (ECG) ◮ . . . 3 / 24

  4. Stochastic Processes ◮ A stochastic process is a family of random variables X ( t ) , t ∈ T indexed by a parameter t in an index set T ◮ We will consider discrete-time stochastic processes where T = Z (the integers) ◮ A time series is said to be strictly stationary if the joint distribution of X ( t 1 ) , . . . , X ( t n ) is the same as the joint distribution of X ( t 1 + τ ) , . . . , X ( t n + τ ) for all t 1 , . . . , t n , τ ◮ A time series is said to be weakly stationary if its mean is constant and its autocovariance function depends only on the lag, i.e. E [ X ( t )] = µ ∀ t Cov [ X ( t ) X ( t + τ )] = γ ( τ ) ◮ A Gaussian process is a family of random variables, any finite number of which have a joint Gaussian distribution ◮ The ARMA models we will study are stationary Gaussian processes 4 / 24

  5. Autoregressive (AR) Models ◮ Example AR(1) x t = α x t − 1 + w t where w t ∼ N ( 0 , σ 2 ) ◮ By repeated substitution we get x t = w t + α w t − 1 + α 2 w t − 2 + . . . ◮ Hence E [ X ( t )] = 0, and if | α | < 1 the process is stationary with σ 2 Var [ X ( t )] = ( 1 + α 2 + α 4 + . . . ) σ 2 = 1 − α 2 ◮ Similarly Cov [ X ( t ) X ( t − k )] = α k Var [ X ( t − k )] = α k σ 2 1 − α 2 5 / 24

  6. α = 0 . 5 α = − 0 . 5 6 / 24

  7. ◮ The general case is an AR( p ) process p � x t = α i x t − i + w t i = 1 ◮ Notice how x t is obtainied by a (linear) regression from x t − 1 , . . . , x t − p , hence an autoregressive process ◮ Introduce the backward shift operator B , so that Bx t = x t − 1 , B 2 x t = x t − 2 etc ◮ Then AR( p ) model can be written as φ ( B ) x t = w t where φ ( B ) = ( 1 − α 1 B . . . − α p B p ) ◮ The condition for stationarity is that all the roots of φ ( B ) lie outside the unit circle 7 / 24

  8. Yule-Walker Equations p � x t = α i x t − i + w t i = 1 p � x t x t − k = α i x t − i x t − k + w t x t − k i = 1 ◮ Taking expectations (and exploiting stationarity) we obtain p � γ k = α i γ k − i k = 1 , , 2 , . . . i = 1 ◮ Use p simultaneous equations to obtain the γ ’s from the α ’s. For inference, can solve a linear system to obtain the α ’s given estimates of the γ ’s ◮ Example: AR(1) process, γ k = α k γ 0 8 / 24

  9. Graphical model illustrating an AR(2) process . . . . . . AR2: α 1 = 0 . 2 , α 2 = 0 . 1 AR2: α 1 = 1 . 0 , α 2 = − 0 . 5 9 / 24

  10. Vector AR processes p � x t = A i x t − i + G w t i = 1 where the A i s and G are square matrices ◮ We can in general consider modelling multivariate (as opposed to univariate) time series ◮ An AR(2) process can be written as a vector AR(1) process: � α 1 � � x t − 1 � 1 � � � � � � x t 0 w t α 2 = + x t − 1 1 0 x t − 2 0 0 w t − 1 ◮ In general an AR( p ) process can be written as a vector AR(1) process with a p -dimensional state vector (cf ODEs) 10 / 24

  11. Moving Average (MA) processes q � x t = ( linear filter ) β j w t − j j = 0 = θ ( B ) w t with scaling so that β 0 = 1 and θ ( B ) = 1 + � q j = 1 β j B j Example: MA(1) process w’s x’s 11 / 24

  12. ◮ We have E [ X ( t )] = 0, and Var [ X ( t )] = ( 1 + β 2 1 + . . . + β 2 q ) σ 2 q q � � Cov [ X ( t ) X ( t − k )] = E [ β j w t − j , β i w t − k − i ] j = 0 i = 0 � σ 2 � q − k j = 0 β j + k β j for k = 0 , 1 , . . . , q = 0 for k > q ◮ Note that covariance “cuts off” for k > q 12 / 24

  13. ARMA( p , q ) processes p q � � x t = α i x t − i + β j w t − j i = 1 j = 0 φ ( B ) x t = θ ( B ) w t ◮ Writing an AR( p ) process as a MA( ∞ ) process φ ( B ) x t = w t x t = ( 1 − α 1 B . . . α p B p ) − 1 w t = ( 1 + β 1 B + β 2 B 2 . . . ) w t ◮ Similarly a MA( q ) process can be written as a AR( ∞ ) process ◮ Utility of ARMA( p , q ) is potential parsimony 13 / 24

  14. The Fourier View ◮ ARMA models are linear time-invariant systems. Hence sinusoids are their eigenfunctions (Fourier analysis) ◮ This means it is natural to consider the power spectrum of the ARMA process. The power spectrum S ( k ) can be determined from the { α } , { β } coefficients ◮ Roughly speaking S ( k ) is the amount of power allocated on average to the eigenfunction e 2 π ikt ◮ This is a useful way to understand some properties of ARMA processes, but we will not pursue it further here ◮ If you want to know more, see e.g. Chatfield (1989) chapter 7 or Diggle (1990) chapter 4 14 / 24

  15. Parameter Estimation ◮ Let the vector of observations x = ( x ( t 1 ) , . . . , x ( t n )) T ◮ Estimate and subtract constant offset ˆ µ if this is non zero ◮ ARMA models driven by Gaussian noise are Gaussian processes. Thus the likelihood L ( x ; α , β ) is a multivariate Gaussian, and we can optimize this wrt the parameters (e.g. by gradient ascent) ◮ AR( p ) models, p � x t = α i x t − i + w t i = 1 can be viewed as the linear regression of x t on the p previous time steps, α and σ 2 can be estimated using linear regression ◮ This viewpoint also enables the fitting of nonlinear AR models 15 / 24

  16. Model Order Selection, References ◮ For a MA( q ) process there should be a cut-off in the autocorrelation function for lags greater than q ◮ For general ARMA models this is model order selection problem, discussed in an upcoming lecture Some useful books: ◮ The Analysis of Time Series: An Introduction. C. Chatfield, Chapman and Hall, 4th edition, 1989 ◮ Time Series: A Biostatistical Introduction. P . J. Diggle, Clarendon Press, 1990 16 / 24

  17. Linear-Gaussian HMMs ◮ HMM with continuous state-space and observations ◮ Filtering problem known as Kalman filtering 17 / 24

  18. z z z z N 1 2 3 . . A A C C C C x x x x N 1 2 3 ◮ Dynamical model z n + 1 = A z n + w n + 1 where w n + 1 ∼ N ( 0 , Γ) is Gaussian noise, i.e. p ( z n + 1 | z n ) ∼ N ( A z n , Γ) 18 / 24

  19. ◮ Observation model x n = C z n + v n where v n ∼ N ( 0 , Σ) is Gaussian noise, i.e. p ( x n | z n ) ∼ N ( C z n , Σ) ◮ Initialization p ( z 1 ) ∼ N ( µ 0 , V 0 ) 19 / 24

  20. Inference Problem – filtering ◮ As whole model is Gaussian, only need to compute means and variances p ( z n | x 1 , . . . , x n ) ∼ N ( µ n , V n ) µ n = E [ z n | x 1 , . . . , x n ] V n = E [( z n − µ n )( z n − µ n ) T | x 1 , . . . , x n ] ◮ Recursive update split into two parts ◮ Time update p ( z n | x 1 , . . . , x n ) → p ( z n + 1 | x 1 , . . . , x n ) ◮ Measurement update p ( z n + 1 | x 1 , . . . , x n ) → p ( z n + 1 | x 1 , . . . , x n , x n + 1 ) 20 / 24

  21. ◮ Time update z n + 1 = A z n + w n + 1 thus E [ z n + 1 | x 1 , . . . x n ] = A µ n = P n = AV n A T + Γ def cov ( z n + 1 | x 1 , . . . x n ) ◮ Measurement update (like posterior in Factor Analysis) µ n + 1 = A µ n + K n + 1 ( x n + 1 − CA µ n ) V n + 1 = ( I − K n + 1 C ) P n where K n + 1 = P n C T ( CP n C T + Σ) − 1 ◮ K n + 1 is known as the Kalman gain matrix 21 / 24

  22. Simple example z n + 1 = z n + w n + 1 w n ∼ N ( 0 , 1 ) x n = z n + v n v n ∼ N ( 0 , 1 ) p ( z 1 ) ∼ N ( 0 , σ 2 ) In the limit σ 2 → ∞ we find µ 3 = 5 x 3 + 2 x 2 + x 1 8 ◮ Notice how later data has more weight ◮ Compare z n + 1 = z n (so that w n has zero variance); then µ 3 = x 3 + x 2 + x 1 3 22 / 24

  23. Applications Much as a coffee filter serves to keep undesirable grounds out of your morning mug, the Kalman filter is designed to strip unwanted noise out of a stream of data. Barry Cipra, SIAM News 26(5) 1993 ◮ Navigational and guidance systems ◮ Radar tracking ◮ Sonar ranging ◮ Satellite orbit determination 23 / 24

  24. Extensions Dealing with non-linearity ◮ The Extended Kalman Filter (EKF) If x n = f ( z n ) + v n where f is a non-linear function, can linearize f , e.g. around E [ z n | x 1 , . . . x n − 1 ] . Works for weak non-linearities ◮ For very non-linear problems use sampling methods (known as particle filters). Example, work of Blake and Isard on tracking, see http://www.robots.ox.ac.uk/ ∼ vdg/dynamics.html It is possible to train KFs using a forward-backward algorithm 24 / 24

Recommend


More recommend