Outline ◮ Stochastic processes Time Series Modelling and Kalman Filters ◮ AR, MA and ARMA models ◮ The Fourier view Chris Williams ◮ Parameter estimation for ARMA models ◮ Linear-Gaussian HMMs (Kalman filtering) School of Informatics, University of Edinburgh ◮ Reading: Handout on Time Series Modelling: AR, MA, November 2010 ARMA and All That ◮ Reading: Bishop 13.3 (but not 13.3.2, 13.3.3) 1 / 24 2 / 24 Example time series Stochastic Processes ◮ A stochastic process is a family of random variables X ( t ) , t ∈ T indexed by a parameter t in an index set T ◮ We will consider discrete-time stochastic processes where T = Z (the integers) ◮ A time series is said to be strictly stationary if the joint ◮ FTSE 100 distribution of X ( t 1 ) , . . . , X ( t n ) is the same as the joint ◮ Meteorology: temperature, pressure ... distribution of X ( t 1 + τ ) , . . . , X ( t n + τ ) for all t 1 , . . . , t n , τ ◮ Seismology ◮ A time series is said to be weakly stationary if its mean is ◮ Electrocardiogram (ECG) constant and its autocovariance function depends only on the lag, i.e. ◮ . . . E [ X ( t )] = µ ∀ t Cov [ X ( t ) X ( t + τ )] = γ ( τ ) ◮ A Gaussian process is a family of random variables, any finite number of which have a joint Gaussian distribution ◮ The ARMA models we will study are stationary Gaussian processes 3 / 24 4 / 24
α = 0 . 5 Autoregressive (AR) Models ◮ Example AR(1) x t = α x t − 1 + w t where w t ∼ N ( 0 , σ 2 ) ◮ By repeated substitution we get x t = w t + α w t − 1 + α 2 w t − 2 + . . . ◮ Hence E [ X ( t )] = 0, and if | α | < 1 the process is stationary α = − 0 . 5 with σ 2 Var [ X ( t )] = ( 1 + α 2 + α 4 + . . . ) σ 2 = 1 − α 2 ◮ Similarly Cov [ X ( t ) X ( t − k )] = α k Var [ X ( t − k )] = α k σ 2 1 − α 2 5 / 24 6 / 24 Yule-Walker Equations ◮ The general case is an AR( p ) process p p � x t = α i x t − i + w t � x t = α i x t − i + w t i = 1 i = 1 p ◮ Notice how x t is obtainied by a (linear) regression from � x t x t − k = α i x t − i x t − k + w t x t − k x t − 1 , . . . , x t − p , hence an autoregressive process i = 1 ◮ Introduce the backward shift operator B , so that Bx t = x t − 1 , B 2 x t = x t − 2 etc ◮ Taking expectations (and exploiting stationarity) we obtain ◮ Then AR( p ) model can be written as p φ ( B ) x t = w t � γ k = k = 1 , , 2 , . . . α i γ k − i i = 1 where φ ( B ) = ( 1 − α 1 B . . . − α p B p ) ◮ Use p simultaneous equations to obtain the γ ’s from the ◮ The condition for stationarity is that all the roots of φ ( B ) lie α ’s. For inference, can solve a linear system to obtain the outside the unit circle α ’s given estimates of the γ ’s ◮ Example: AR(1) process, γ k = α k γ 0 7 / 24 8 / 24
Vector AR processes Graphical model illustrating an AR(2) process . . . . . . p AR2: α 1 = 0 . 2 , α 2 = 0 . 1 � x t = A i x t − i + G w t i = 1 where the A i s and G are square matrices ◮ We can in general consider modelling multivariate (as opposed to univariate) time series ◮ An AR(2) process can be written as a vector AR(1) process: AR2: α 1 = 1 . 0 , α 2 = − 0 . 5 � α 1 � � x t − 1 � 1 � � � � � � x t 0 w t α 2 = + x t − 1 1 0 x t − 2 0 0 w t − 1 ◮ In general an AR( p ) process can be written as a vector AR(1) process with a p -dimensional state vector (cf ODEs) 9 / 24 10 / 24 Moving Average (MA) processes ◮ We have E [ X ( t )] = 0, and q � x t = β j w t − j ( linear filter ) Var [ X ( t )] = ( 1 + β 2 1 + . . . + β 2 q ) σ 2 j = 0 q q = θ ( B ) w t � � Cov [ X ( t ) X ( t − k )] = E [ β j w t − j , β i w t − k − i ] j = 0 i = 0 with scaling so that β 0 = 1 and θ ( B ) = 1 + � q j = 1 β j B j � σ 2 � q − k j = 0 β j + k β j for k = 0 , 1 , . . . , q = 0 for k > q Example: MA(1) process w’s ◮ Note that covariance “cuts off” for k > q x’s 11 / 24 12 / 24
ARMA( p , q ) processes The Fourier View p q ◮ ARMA models are linear time-invariant systems. Hence � � x t = α i x t − i + β j w t − j sinusoids are their eigenfunctions (Fourier analysis) i = 1 j = 0 ◮ This means it is natural to consider the power spectrum of φ ( B ) x t = θ ( B ) w t the ARMA process. The power spectrum S ( k ) can be determined from the { α } , { β } coefficients ◮ Writing an AR( p ) process as a MA( ∞ ) process ◮ Roughly speaking S ( k ) is the amount of power allocated on average to the eigenfunction e 2 π ikt φ ( B ) x t = w t ◮ This is a useful way to understand some properties of x t = ( 1 − α 1 B . . . α p B p ) − 1 w t ARMA processes, but we will not pursue it further here = ( 1 + β 1 B + β 2 B 2 . . . ) w t ◮ If you want to know more, see e.g. Chatfield (1989) chapter 7 or Diggle (1990) chapter 4 ◮ Similarly a MA( q ) process can be written as a AR( ∞ ) process ◮ Utility of ARMA( p , q ) is potential parsimony 13 / 24 14 / 24 Parameter Estimation Model Order Selection, References ◮ Let the vector of observations x = ( x ( t 1 ) , . . . , x ( t n )) T ◮ Estimate and subtract constant offset ˆ µ if this is non zero ◮ For a MA( q ) process there should be a cut-off in the ◮ ARMA models driven by Gaussian noise are Gaussian autocorrelation function for lags greater than q processes. Thus the likelihood L ( x ; α , β ) is a multivariate Gaussian, and we can optimize this wrt the parameters ◮ For general ARMA models this is model order selection (e.g. by gradient ascent) problem, discussed in an upcoming lecture ◮ AR( p ) models, Some useful books: p � ◮ The Analysis of Time Series: An Introduction. C. Chatfield, x t = α i x t − i + w t Chapman and Hall, 4th edition, 1989 i = 1 ◮ Time Series: A Biostatistical Introduction. P . J. Diggle, can be viewed as the linear regression of x t on the p previous time steps, α and σ 2 can be estimated using Clarendon Press, 1990 linear regression ◮ This viewpoint also enables the fitting of nonlinear AR models 15 / 24 16 / 24
Linear-Gaussian HMMs z z z z N 1 2 3 . . A A C C C C x x x x N 1 2 3 ◮ HMM with continuous state-space and observations ◮ Filtering problem known as Kalman filtering ◮ Dynamical model z n + 1 = A z n + w n + 1 where w n + 1 ∼ N ( 0 , Γ) is Gaussian noise, i.e. p ( z n + 1 | z n ) ∼ N ( A z n , Γ) 17 / 24 18 / 24 Inference Problem – filtering ◮ As whole model is Gaussian, only need to compute means and ◮ Observation model variances p ( z n | x 1 , . . . , x n ) ∼ N ( µ n , V n ) x n = C z n + v n µ n = E [ z n | x 1 , . . . , x n ] where v n ∼ N ( 0 , Σ) is Gaussian noise, i.e. V n = E [( z n − µ n )( z n − µ n ) T | x 1 , . . . , x n ] p ( x n | z n ) ∼ N ( C z n , Σ) ◮ Recursive update split into two parts ◮ Initialization ◮ Time update p ( z 1 ) ∼ N ( µ 0 , V 0 ) p ( z n | x 1 , . . . , x n ) → p ( z n + 1 | x 1 , . . . , x n ) ◮ Measurement update p ( z n + 1 | x 1 , . . . , x n ) → p ( z n + 1 | x 1 , . . . , x n , x n + 1 ) 19 / 24 20 / 24
Simple example ◮ Time update z n + 1 = A z n + w n + 1 z n + 1 = z n + w n + 1 thus w n ∼ N ( 0 , 1 ) E [ z n + 1 | x 1 , . . . x n ] = A µ n x n = z n + v n def = P n = AV n A T + Γ v n ∼ N ( 0 , 1 ) cov ( z n + 1 | x 1 , . . . x n ) p ( z 1 ) ∼ N ( 0 , σ 2 ) ◮ Measurement update (like posterior in Factor Analysis) In the limit σ 2 → ∞ we find µ n + 1 = A µ n + K n + 1 ( x n + 1 − CA µ n ) µ 3 = 5 x 3 + 2 x 2 + x 1 V n + 1 = ( I − K n + 1 C ) P n 8 where ◮ Notice how later data has more weight K n + 1 = P n C T ( CP n C T + Σ) − 1 ◮ Compare z n + 1 = z n (so that w n has zero variance); then ◮ K n + 1 is known as the Kalman gain matrix µ 3 = x 3 + x 2 + x 1 3 21 / 24 22 / 24 Applications Extensions Dealing with non-linearity Much as a coffee filter serves to keep undesirable ◮ The Extended Kalman Filter (EKF) grounds out of your morning mug, the Kalman filter is If x n = f ( z n ) + v n where f is a non-linear function, can designed to strip unwanted noise out of a stream of linearize f , e.g. around E [ z n | x 1 , . . . x n − 1 ] . Works for weak data. Barry Cipra, SIAM News 26(5) 1993 non-linearities ◮ Navigational and guidance systems ◮ For very non-linear problems use sampling methods (known as particle filters). Example, work of Blake and ◮ Radar tracking Isard on tracking, see ◮ Sonar ranging http://www.robots.ox.ac.uk/ ∼ vdg/dynamics.html ◮ Satellite orbit determination It is possible to train KFs using a forward-backward algorithm 23 / 24 24 / 24
Recommend
More recommend