Time Series Modelling and Kalman Filters Chris Williams School of Informatics, University of Edinburgh November 2010 1 / 24
Outline ◮ Stochastic processes ◮ AR, MA and ARMA models ◮ The Fourier view ◮ Parameter estimation for ARMA models ◮ Linear-Gaussian HMMs (Kalman filtering) ◮ Reading: Handout on Time Series Modelling: AR, MA, ARMA and All That ◮ Reading: Bishop 13.3 (but not 13.3.2, 13.3.3) 2 / 24
Example time series ◮ FTSE 100 ◮ Meteorology: temperature, pressure ... ◮ Seismology ◮ Electrocardiogram (ECG) ◮ . . . 3 / 24
Stochastic Processes ◮ A stochastic process is a family of random variables X ( t ) , t ∈ T indexed by a parameter t in an index set T ◮ We will consider discrete-time stochastic processes where T = Z (the integers) ◮ A time series is said to be strictly stationary if the joint distribution of X ( t 1 ) , . . . , X ( t n ) is the same as the joint distribution of X ( t 1 + τ ) , . . . , X ( t n + τ ) for all t 1 , . . . , t n , τ ◮ A time series is said to be weakly stationary if its mean is constant and its autocovariance function depends only on the lag, i.e. E [ X ( t )] = µ ∀ t Cov [ X ( t ) X ( t + τ )] = γ ( τ ) ◮ A Gaussian process is a family of random variables, any finite number of which have a joint Gaussian distribution ◮ The ARMA models we will study are stationary Gaussian processes 4 / 24
Autoregressive (AR) Models ◮ Example AR(1) x t = α x t − 1 + w t where w t ∼ N ( 0 , σ 2 ) ◮ By repeated substitution we get x t = w t + α w t − 1 + α 2 w t − 2 + . . . ◮ Hence E [ X ( t )] = 0, and if | α | < 1 the process is stationary with σ 2 Var [ X ( t )] = ( 1 + α 2 + α 4 + . . . ) σ 2 = 1 − α 2 ◮ Similarly Cov [ X ( t ) X ( t − k )] = α k Var [ X ( t − k )] = α k σ 2 1 − α 2 5 / 24
α = 0 . 5 α = − 0 . 5 6 / 24
◮ The general case is an AR( p ) process p � x t = α i x t − i + w t i = 1 ◮ Notice how x t is obtainied by a (linear) regression from x t − 1 , . . . , x t − p , hence an autoregressive process ◮ Introduce the backward shift operator B , so that Bx t = x t − 1 , B 2 x t = x t − 2 etc ◮ Then AR( p ) model can be written as φ ( B ) x t = w t where φ ( B ) = ( 1 − α 1 B . . . − α p B p ) ◮ The condition for stationarity is that all the roots of φ ( B ) lie outside the unit circle 7 / 24
Yule-Walker Equations p � x t = α i x t − i + w t i = 1 p � x t x t − k = α i x t − i x t − k + w t x t − k i = 1 ◮ Taking expectations (and exploiting stationarity) we obtain p � γ k = α i γ k − i k = 1 , , 2 , . . . i = 1 ◮ Use p simultaneous equations to obtain the γ ’s from the α ’s. For inference, can solve a linear system to obtain the α ’s given estimates of the γ ’s ◮ Example: AR(1) process, γ k = α k γ 0 8 / 24
Graphical model illustrating an AR(2) process . . . . . . AR2: α 1 = 0 . 2 , α 2 = 0 . 1 AR2: α 1 = 1 . 0 , α 2 = − 0 . 5 9 / 24
Vector AR processes p � x t = A i x t − i + G w t i = 1 where the A i s and G are square matrices ◮ We can in general consider modelling multivariate (as opposed to univariate) time series ◮ An AR(2) process can be written as a vector AR(1) process: � α 1 � � x t − 1 � 1 � � � � � � x t 0 w t α 2 = + x t − 1 1 0 x t − 2 0 0 w t − 1 ◮ In general an AR( p ) process can be written as a vector AR(1) process with a p -dimensional state vector (cf ODEs) 10 / 24
Moving Average (MA) processes q � x t = ( linear filter ) β j w t − j j = 0 = θ ( B ) w t with scaling so that β 0 = 1 and θ ( B ) = 1 + � q j = 1 β j B j Example: MA(1) process w’s x’s 11 / 24
◮ We have E [ X ( t )] = 0, and Var [ X ( t )] = ( 1 + β 2 1 + . . . + β 2 q ) σ 2 q q � � Cov [ X ( t ) X ( t − k )] = E [ β j w t − j , β i w t − k − i ] j = 0 i = 0 � σ 2 � q − k j = 0 β j + k β j for k = 0 , 1 , . . . , q = 0 for k > q ◮ Note that covariance “cuts off” for k > q 12 / 24
ARMA( p , q ) processes p q � � x t = α i x t − i + β j w t − j i = 1 j = 0 φ ( B ) x t = θ ( B ) w t ◮ Writing an AR( p ) process as a MA( ∞ ) process φ ( B ) x t = w t x t = ( 1 − α 1 B . . . α p B p ) − 1 w t = ( 1 + β 1 B + β 2 B 2 . . . ) w t ◮ Similarly a MA( q ) process can be written as a AR( ∞ ) process ◮ Utility of ARMA( p , q ) is potential parsimony 13 / 24
The Fourier View ◮ ARMA models are linear time-invariant systems. Hence sinusoids are their eigenfunctions (Fourier analysis) ◮ This means it is natural to consider the power spectrum of the ARMA process. The power spectrum S ( k ) can be determined from the { α } , { β } coefficients ◮ Roughly speaking S ( k ) is the amount of power allocated on average to the eigenfunction e 2 π ikt ◮ This is a useful way to understand some properties of ARMA processes, but we will not pursue it further here ◮ If you want to know more, see e.g. Chatfield (1989) chapter 7 or Diggle (1990) chapter 4 14 / 24
Parameter Estimation ◮ Let the vector of observations x = ( x ( t 1 ) , . . . , x ( t n )) T ◮ Estimate and subtract constant offset ˆ µ if this is non zero ◮ ARMA models driven by Gaussian noise are Gaussian processes. Thus the likelihood L ( x ; α , β ) is a multivariate Gaussian, and we can optimize this wrt the parameters (e.g. by gradient ascent) ◮ AR( p ) models, p � x t = α i x t − i + w t i = 1 can be viewed as the linear regression of x t on the p previous time steps, α and σ 2 can be estimated using linear regression ◮ This viewpoint also enables the fitting of nonlinear AR models 15 / 24
Model Order Selection, References ◮ For a MA( q ) process there should be a cut-off in the autocorrelation function for lags greater than q ◮ For general ARMA models this is model order selection problem, discussed in an upcoming lecture Some useful books: ◮ The Analysis of Time Series: An Introduction. C. Chatfield, Chapman and Hall, 4th edition, 1989 ◮ Time Series: A Biostatistical Introduction. P . J. Diggle, Clarendon Press, 1990 16 / 24
Linear-Gaussian HMMs ◮ HMM with continuous state-space and observations ◮ Filtering problem known as Kalman filtering 17 / 24
z z z z N 1 2 3 . . A A C C C C x x x x N 1 2 3 ◮ Dynamical model z n + 1 = A z n + w n + 1 where w n + 1 ∼ N ( 0 , Γ) is Gaussian noise, i.e. p ( z n + 1 | z n ) ∼ N ( A z n , Γ) 18 / 24
◮ Observation model x n = C z n + v n where v n ∼ N ( 0 , Σ) is Gaussian noise, i.e. p ( x n | z n ) ∼ N ( C z n , Σ) ◮ Initialization p ( z 1 ) ∼ N ( µ 0 , V 0 ) 19 / 24
Inference Problem – filtering ◮ As whole model is Gaussian, only need to compute means and variances p ( z n | x 1 , . . . , x n ) ∼ N ( µ n , V n ) µ n = E [ z n | x 1 , . . . , x n ] V n = E [( z n − µ n )( z n − µ n ) T | x 1 , . . . , x n ] ◮ Recursive update split into two parts ◮ Time update p ( z n | x 1 , . . . , x n ) → p ( z n + 1 | x 1 , . . . , x n ) ◮ Measurement update p ( z n + 1 | x 1 , . . . , x n ) → p ( z n + 1 | x 1 , . . . , x n , x n + 1 ) 20 / 24
◮ Time update z n + 1 = A z n + w n + 1 thus E [ z n + 1 | x 1 , . . . x n ] = A µ n = P n = AV n A T + Γ def cov ( z n + 1 | x 1 , . . . x n ) ◮ Measurement update (like posterior in Factor Analysis) µ n + 1 = A µ n + K n + 1 ( x n + 1 − CA µ n ) V n + 1 = ( I − K n + 1 C ) P n where K n + 1 = P n C T ( CP n C T + Σ) − 1 ◮ K n + 1 is known as the Kalman gain matrix 21 / 24
Simple example z n + 1 = z n + w n + 1 w n ∼ N ( 0 , 1 ) x n = z n + v n v n ∼ N ( 0 , 1 ) p ( z 1 ) ∼ N ( 0 , σ 2 ) In the limit σ 2 → ∞ we find µ 3 = 5 x 3 + 2 x 2 + x 1 8 ◮ Notice how later data has more weight ◮ Compare z n + 1 = z n (so that w n has zero variance); then µ 3 = x 3 + x 2 + x 1 3 22 / 24
Applications Much as a coffee filter serves to keep undesirable grounds out of your morning mug, the Kalman filter is designed to strip unwanted noise out of a stream of data. Barry Cipra, SIAM News 26(5) 1993 ◮ Navigational and guidance systems ◮ Radar tracking ◮ Sonar ranging ◮ Satellite orbit determination 23 / 24
Extensions Dealing with non-linearity ◮ The Extended Kalman Filter (EKF) If x n = f ( z n ) + v n where f is a non-linear function, can linearize f , e.g. around E [ z n | x 1 , . . . x n − 1 ] . Works for weak non-linearities ◮ For very non-linear problems use sampling methods (known as particle filters). Example, work of Blake and Isard on tracking, see http://www.robots.ox.ac.uk/ ∼ vdg/dynamics.html It is possible to train KFs using a forward-backward algorithm 24 / 24
Recommend
More recommend