Moving average MA(1) model Warning : not to be confused with moving average of slide 17. Moving average model MA(1) Exercise Consider a white noise ( ω t ) t ∈ Z ∼ WN (0 , σ 2 ) and construct the MA(1) as X t = ω t + θω t − 1 ∀ t ∈ Z ◮ Is it stationary ? ◮ Compute its ACF.
Autoregressive AR(1) model Autoregressive AR(1) Exercise Assume that it is stationary and compute Consider a white noise ( ω t ) t ∈ Z ∼ WN (0 , σ 2 ) and construct the AR(1) as X t = φ X t − 1 + ω t ∀ t ∈ Z ◮ its mean function ◮ its ACF.
Linear processes Linear process as follows (1) Theorem above is stationary. (see Proposition 3.1.2 in [BD13]). Exercise Consider a white noise ( ω t ) t ∈ Z ∼ WN (0 , σ 2 ) and define the linear process X � X t = µ + ψ j ω t − j ∀ t ∈ Z j ∈ Z where µ ∈ R and ( ψ j ) satisfies � j ∈ Z | ψ j | < ∞ . X The series in Equation (1) converges in L 2 and the linear process X defined Compute the mean and autocovariance functions of ( X t ) t ∈ Z .
Examples of linear processes Exercise ◮ Show that the following processes are particular linear processes ◮ the white noise process ◮ the MA(1) process. ◮ Consider a linear process as defined on slide 29, put µ = 0 , � ψ j = φ j if j ≥ 0 ψ j = 0 if j < 0 and suppose | φ | < 1 . Show that X is in fact an AR(1) process.
Estimation compute n Suppose that X is a stationary time series and recall that n Estimation µ X ( t ) = µ, γ X ( h ) = Cov ( X t , X t + h ) and ρ X ( h ) = γ X ( h ) γ X (0) for all t , h ∈ Z . From observations X 1 , . . . , X n (from the stationary time series X ), we can � n ◮ the sample mean ¯ X = 1 t =1 X t ◮ the sample autocovariance function n −| h | � γ X ( h ) = 1 ( X t + | h | − ¯ X )( X t − ¯ ˆ X ) ∀ − n < h < n t =1 ◮ the sample autocorrelation function ρ X ( h ) = ˆ γ X ( h ) ˆ γ X (0) . ˆ
X t n not the corresponding empirical covariance ! ! Warning : γ X ( h ) = Cov ( X t , X t + h ) but the sample autocorrelation function is n −| h | � 1 ( X t + | h | − ¯ X )( X t − ¯ X ) � = t =1 n −| h | n −| h | n −| h | � � � � �� � 1 1 1 X t + | h | − X t − X t + h n − | h | n − | h | n − | h | t =1 t =1 t =1
Examples of sample ACF Can you find the generating time series models (white noise, MA(1), AR(1), random noise with drift) associated with the sample ACF ? Exercise Sample ACF 1 Sample ACF 3 0.8 0.8 ACF ACF 0.0 −0.2 0 5 10 15 20 0 5 10 15 20 Lag Lag Sample ACF 2 Sample ACF 4 1.0 ACF −0.5 0.8 ACF 0.0 0 5 10 15 20 0 5 10 15 20 Lag Lag
Examples of sample ACF Fig. : The ACF of speech data example on slide 8 Notice : 1.0 0.5 ACF 0.0 −0.5 0 50 100 150 200 250 LAG ◮ the regular repetition of short peaks with decreasing amplitude.
See Appendix A [SS10] n Theorem If X is a stationary time series, the sample mean verifies X n As a consequence, if n n Properties of ¯ E (¯ X n ) = µ � � � X n ) = 1 1 − | h | V (¯ γ ( h ) . h = − n ∞ ∞ ∞ � � � | γ ( h ) | < ∞ then n V (¯ n →∞ γ ( h ) = σ 2 X n ) → ρ ( h ) h = −∞ h = −∞ h = −∞ and ¯ X n converges in L 2 to µ . Notice that, in the independent case, n V (¯ n →∞ → σ 2 . The correlation has X n ) hence the effect of reducing to sample size from n to n / � ∞ h = −∞ ρ ( h ) .
Large sample property approximately normally distributed with zero mean and standard deviation Theorem given by significant. See Appendix A [SS10] Under general conditions, if X is a white noise, then for n large, the sample ACF, ˆ ρ X ( h ) , for h = 1 , 2 , . . . , H, where H is fixed but arbitrary, is 1 ρ X ( h ) = √ n σ ˆ Consequence : only the peaks outside of ± 2/ √ n may be considered to be Sample ACF 1 0.8 ACF −0.2 0 5 10 15 20 Lag
ACF and prediction Notice that Exercise Linear predictor and ACF a crucial property for forecasting. Prove the result. Let X be a stationary time series with ACF ρ . The linear predictor ˆ X { n } n + h of X n + h given X n is defined as �� � 2 � ˆ X { n } n + h = argmin X n + h − ( aX n + b ) = ρ ( h )( X n − µ ) + µ E a , b ◮ linear prediction needs only second order statistics, we’ll see later that it is ◮ the result extends to longer histories ( X n , X n − 1 , . . . ).
Chapter 2 : Chasing stationarity, exploratory data analysis
Because we want to do statistics : averaging lagged products over time, as in the previous section, has to be a sensible thing to do. Real time series are often non-stationary, so we need methods to “stationarize” the series. ◮ Why do we need to chase stationarity ? ◮ But....
An example I Fig. : Monthly sales for a souvenir shop on the wharf at a beach resort town in Queensland, Australia. [MWH08] Monthly sales for a souvenir shop 100000 75000 fancy 50000 25000 0 1987 1988 1989 1990 1991 1992 1993 1994 Time
An example II Notice that the variance grows with the mean, this usually calls for a log transformation transformation ( X → log ( X ) ), which is part of the general family of Box-Cox � X → X λ − 1 / λ λ � = 0 X → log ( X ) λ = 0
An example III Fig. : Log of monthly sales. The series is not yet stationary because there are a trend and a seasonal components. Log of monthly sales for a souvenir shop 11 10 log(fancy) 9 8 1987 1988 1989 1990 1991 1992 1993 1994 Time
An example IV Fig. : Decomposition of monthly sales with slt function in R 11 10 data 9 8 1.0 seasonal 0.5 0.0 −0.5 10.0 9.5 trend 9.0 8.5 0.3 0.2 remainder 0.1 0.0 −0.1 −0.2 −0.3 1987 1988 1989 1990 1991 1992 1993 1994 Time
Classical decomposition of a time series où Y t = T t + S t + X t ◮ T = ( T t ) t ∈ Z is the trend ◮ S = ( S t ) t ∈ Z is the seasonality ◮ X = ( X t ) t ∈ Z is a stationary centered time series.
Back to the global temperature I 1951-1980 - see slide 7 Global Temperature Deviations 0.5 0.0 −0.5 1880 1900 1920 1940 1960 1980 2000 2020 Time Fig. : Global temperature deviation (in ◦ C ) from 1880 to 2015, with base period
Back to the global temperature II Fig. : ACF of global temperature deviation ACF of global temperature 0.9 0.6 ACF 0.3 0.0 5 10 15 Lag
Back to the global temperature III We model this time series as Looking at the series, two possible models for T are white noise (see slide 19). In both models, we notice that are stationary time series (check this fact as an exercise). Y t = T t + X t and are now looking for a model for T = ( T t ) t ∈ Z . ◮ (model 1) a linear function of t T t = β 1 + β 2 t ◮ (model 2) a random walk with drift T t = δ + T t − 1 + η t , where η is a Y t − Y t − 1 = T t − T t − 1 + ω t − ω t − 1 = β 2 + ω t − ω t − 1 (model 1) Y t − Y t − 1 = T t − T t − 1 + ω t − ω t − 1 = δ + η t + ω t − ω t − 1 (model 2)
Back to the global temperature IV Fig. : Differenced global temperature deviation Global Temperature Deviations 0.3 0.2 diff(globtemp_1900_1997, 1) 0.1 0.0 −0.1 −0.2 1900 1920 1940 1960 1980 2000 Time
Back to the global temperature V Fig. : Differenced global temperature deviation Not far from a white noise ! ! ACF of differenced global temperature 0.2 0.1 ACF 0.0 −0.1 −0.2 −0.3 5 10 15 Lag
Backshift operator Backshift operator For a time series X , we define the backshift operator as similary Difference of order d Differences of order d are defined as To stationarize the global temperature series, we applied the 1st order difference to it. See http ://a-little-book-of-r-for-time- series.readthedocs.io/en/latest/src/timeseries.html for an example of 2nd order integrated ts. BX t = X t − 1 , B k X t = X t − k . ∇ d = (1 − B ) d .
Moving average smoother Moving average smoother For a time series X , k Note : slt function in R uses loess regression, the moving average smoother is just a loess regression with polynomials of order 1. More details on this on http ://www.wessa.net/download/stl.pdf, [CCT90]. � M t = a j X t − j j = − k with a j = a − j ≥ 0 and � k j = − k a j = 1 is a symmetric moving average.
Fig. : Smoothed (splines) mortality Fig. : Smoothed (ma 5 and 53) mortality period 1970-1979. Fig. : Average daily cardiovascular mortality in Los Angeles county over the 10 year 120 cmort 100 80 1970 1972 1974 1976 1978 1980 Time 120 120 110 110 variable variable 100 value value 100 ma5 s_7 ma53 s_1 90 90 80 80 70 70 1970.0 1972.5 1975.0 1977.5 1980.0 1970.0 1972.5 1975.0 1977.5 1980.0 date date
Chapter 3 : ARMA models
Introduction We now consider that we have estimated the trend and seasonal components of Aim of the chapter : to propose to the time series X via ARMA models. They allow Key fact : we know that autocovariance function, see [BD13]. and a deterministic process. Y t = T t + S t + X t ◮ to describe this time series ◮ to forecast. ◮ for every stationary process with autocovariance function γ verifying lim h →∞ γ ( h ) = 0 , it is possible to find an ARMA process with the same ◮ The Wold decomposition (see [SS10] Appendix B) also plays an important role. Its says that every stationary process is the sum of a MA( ∞ ) process
AR(1) Exercice this is not a stationary time series. Consider a time series X following the AR(1) model X t = φ X t − 1 + ω t ∀ t ∈ Z . 1. Show that for all k > 0 X t = φ k X t − k + � k − 1 j =0 φ j ω t − j . = � ∞ L 2 j =0 φ j ω t − j . 2. Assume that | φ | < 1 and prove X t 3. Assume now that | φ | > 1 and prove that 3.1 � k − 1 j =0 φ j ω t − j does not converge in L 2 3.2 one can write X t = − � ∞ j =1 φ − j w t + j 3.3 Discuss why the case | φ | > 1 is useless. The case where | φ | = 1 is a random walk (slide 19) and we already proved that
AR ( 1 ) φ = + 0.9 6 4 2 x 0 −2 0 20 40 60 80 100 AR ( 1 ) φ = − 0.9 5.0 2.5 x 0.0 −2.5 0 20 40 60 80 100
Note on polynomials with complex variables. In particular : Notice that manipulating operators like φ (B) is like manipulating polynomials 1 1 − φ z = 1 + φ z + φ 2 z + . . . provided that | φ | < 1 and | z | ≤ 1 .
Causality Exercise A linear process X is said to be causal when there is Causal linear process We will exclude non-causal AR models from consideration. In fact this is not a restriction because we can find causal counterpart to such process. ◮ a power series π : π ( x ) = π 0 + π 1 x + π 2 x 2 , . . . , ◮ with � ∞ j =0 | π j | < ∞ ◮ and X t = π ( B ) ω t ω is a white noise WN (0 , σ 2 ) . In this case X t is σ { ω t , ω t − 1 , . . . } -measurable. Consider the non-causal AR(1) model X t = φ X t − 1 + ω t with | φ | > 1 and suppose that ω ∼ i . i . d . N (0 , σ 2 ) 1. Which distribution has X t ? 2. Define the time series Y t = φ − 1 Y t − 1 + η t with η ∼ i . i . d . N (0 , σ 2 / φ 2 ) . Prove that X t and Y t have the same distribution.
Autoregressive model AR( p ) An autoregressive model of order p is of the form will write more concisely X t = φ 1 X t − 1 + φ 2 X t − 2 + . . . + φ p X t − p + ω t ∀ t ∈ Z where X is assumed to be stationary and ω is a white noise WN (0 , σ 2 ) . We Φ( B ) X t = ω t ∀ t ∈ Z where φ is the polynomial of degree p φ ( x ) = (1 − φ 1 x − φ 2 x 2 − . . . − φ p x p ) . Without loss of generality, we assume that each X t is centered.
Condition of existence and causality of AR ( p ) A stationary solution to Φ( B ) X t = ω t ∀ t ∈ Z exists if and only if φ ( z ) = 0 = ⇒ | z | � = 1 . In this case, this defines an AR ( p ) process, which is causal iff in addition φ ( z ) = 0 = ⇒ | z | > 1 .
AR ( 2 ) phi_1 = 1.5 phi_2 = − 0.75 5 0 ar2 −5 0 50 100 150 Time Causal Region of an AR(2) 1.0 0.5 real roots 0.0 φ 2 −0.5 complex roots −1.0 −2 −1 0 1 2 φ 1
Moving average model MA( q ) An moving average model of order q is of the form Unlike the AR model, the MA model is stationary for any values of the thetas. X t = ω t + θ 1 ω t − 1 + θ 2 ω t − 2 + . . . + θ q ω t − q ∀ t ∈ Z where ω is a white noise WN (0 , σ 2 ) . We will write more concisely X t = Θ( B ) ω t ∀ t ∈ Z where θ is the polynomial of degree q θ ( x ) = (1 − θ 1 x − θ 2 x 2 − . . . − θ q x q ) .
MA ( 1 ) θ = + 0.9 2 0 x −2 0 20 40 60 80 100
Invertibility I A linear process X is invertible when there is Consider the MA(1) process Show that Invertibility of a MA(1) process In the first case, X is invertible. Invertibility X t = ω t + θω t − 1 = (1 + θ B ) ω t ∀ t ∈ Z where ω is a white noise WN (0 , σ 2 ) . ◮ If | θ | < 1 , ω t = � ∞ j =0 ( − θ ) j X t − j ◮ If | θ | > 1 , ω t = − � ∞ j =1 ( − θ ) − j X t + j ◮ a power series π : π ( x ) = π 0 + π 1 x + π 2 x 2 , . . . , ◮ with � ∞ j =0 | π j | < ∞ ◮ and ω t = π ( B ) X t ω is a white noise WN (0 , σ 2 ) .
Invertibility II Exercise 2. Can we define an invertible time series Y defined through a new Gaussian Consider the non-invertible MA(1) model X t = ω t + θ ω t − 1 with | θ | > 1 and suppose that ω ∼ i . i . d . N (0 , σ 2 ) 1. Which distribution has X t ? white noise η such that X t and Y t have the same distribution ( ∀ t ) ?
Exercise Autoregressive moving average model an ARMA(1,1) process ? Autoregressive moving average model ARMA ( p , q ) An ARMA ( p , q ) process X is a stationary process that is defined through Φ( B ) X t = Θ( B ) ω t where ω ∼ WN (0 , σ 2 ) , Φ is a polynomial of order p , Θ is a polynomial of order q and Φ and Θ have no common factors. Consider the process X defined by X t − 0 . 5 X t − 1 = ω t − 0 . 5 ω t − 1 . Is it trully
Stationarity, causality and invertibility Theorem factors. Exercise Discuss the stationarity, causality and invertibility of Consider the equation Φ( B ) X t = Θ( B ) ω t , where Φ and Θ have no common ◮ There exists a stationary solution iff φ ( z ) = 0 ⇔ | z | � = 1 . ◮ This process ARMA ( p , q ) is causal iff φ ( z ) = 0 ⇔ | z | > 1 . ◮ It is invertible iff the roots of θ ( z ) are outside the unit circle. (1 − 1 . 5 B ) X t = (1 + 0 . 2 B ) ω t .
Theorem If satisfies We can now consider only causal and invertible ARMA processes. Let X be an ARMA process defined by Φ( B ) X t = Θ( B ) ω t . ∀| z | = 1 θ ( z ) � = 0 , then there are polynomials ˜ φ and ˜ θ and a white noise sequence ˜ ω such that X ◮ ˜ Φ( B ) X t = ˜ Θ( B )˜ ω t , ◮ and is a causal, ◮ invertible ARMA process.
The linear process representation of an ARMA Causal and invertible representations can be rewritten Consider a causal, invertible ARMA process defined by Φ( B ) X t = Θ( B ) ω t . It ◮ as a MA( ∞ ) : � X t = Θ( B ) Φ( B ) ω t = ψ ( B ) ω t = ψ k ω t − k k ≥ 0 ◮ or as an AR(( ∞ )) � ω t = Φ( B ) Θ( B ) X t = π ( B ) X t = π k X t − k k ≥ 0 Notice that both π 0 and ψ 0 equal 1 and ( ψ k ) and ( π k ) are entirely determined by ( φ k ) and ( θ k ) .
Autocovariance function of an ARMA Autocovariance of an ARMA representation and equals Exercise order 1. Solve this equation. The autocovariance function of an ARMA( p , q ) follows from its MA( ∞ ) γ ( h ) = σ 2 � ψ k ψ k + h ∀ h ≥ 0 . k ≥ 0 ◮ Compute the ACF of a causal ARMA( 1 , 1 ). ◮ Show that the ACF of this ARMA verifies a linear difference equation of ◮ Compute φ and θ from the ACF.
Chapter 4 : Linear prediction and partial autocorrelation function
Introduction We’ll see that if we know of the ARMA model under consideration, we can build predictions and prediction intervals. ◮ the orders ( p and q ) and ◮ the coefficients
Just to be sure.... ◮ The linear space L 2 of r.v. with finite variance with the inner-product � X , Y � = E ( XY ) is an Hilbert space. ◮ Now considering a time series X with X t ∈ L 2 for all t ◮ the subspace H n = span ( X 1 , . . . , X n ) is a closed subspace of L 2 hence ◮ for all Y ∈ L 2 there exists an unique projection P ( Y ) in H n such that, for all ∀ w ∈ H n � P ( Y ) − Y � ≤ � w − Y � � P ( Y ) − Y , w � = 0 .
Best linear predictor satisfies the prediction equations defined as n Given X 1 , X 2 , . . . , X n , the best linear m -step-ahead predictor of X n + m � X ( n ) n + m = α 0 + φ ( m ) n 1 X n + φ ( m ) n 2 X n − 1 + φ ( m ) φ ( m ) nn X 1 = α 0 + nj X n +1 − j j =1 is the orthogonal projection of X n + m onto span { 1 , X 1 , . . . , X n } . In particular, it E ( X ( n ) n + m − X n + m ) = 0 E (( X ( n ) n + m − X n + m ) X k ) = 0 ∀ k = 1 , . . . , n We’ll now compute α 0 and the φ ( m ) nj ’s.
n n We get Derivation of α 0 � � X ( n ) φ ( m ) φ ( m ) n + m − µ = α 0 + nj X n +1 − j − µ = nj ( X n +1 − j − µ ) j =1 j =1 ◮ Thus, we’ll ignore α 0 and put µ = 0 until we discuss estimation. ◮ There are two consequences 1. the projection of X n + m on onto span { 1 , X 1 , . . . , X n } is in fact the projection onto span { X 1 , . . . , X n } 2. E ( X k X l ) = Cov ( X k , X l )
This can rewritten in matrix notation. n n Derivation of the φ ( m ) nj ’s As X ( n ) n + m satisfies the prediction equations of slide 74, we can write for all k = 1 , . . . , n E (( X ( n ) n + m − X n + m ) X k ) = 0 � � � � � φ ( m ) ⇐ ⇒ = E X n + m X n +1 − k ) nj E X n +1 − j X n +1 − k j =1 � ⇐ ⇒ α j γ ( k − j ) = γ ( m + k − 1) j =1
Prediction n Prediction equations nn The mean square prediction error is given by P n n n n Prediction error where n n The φ ( m ) nj ’s verify Γ n φ ( m ) = γ ( m ) � � Γ n = γ ( k − j ) 1 ≤ j , k ≤ n � � ⊤ , φ ( m ) φ ( m ) n 1 , . . . , φ ( m ) = � � ⊤ . γ ( m ) γ ( m ) , . . . , γ ( m + n − 1) = �� � 2 � X n + m − X ( n ) = γ (0) − ( γ ( m ) ) ⊤ Γ − 1 n γ ( m ) n + m = E . n + m
Forecasting an AR(2) Exercise the prediction equations. . Consider the causal AR(2) model X t = φ 1 X t − 1 + φ 2 X t − 2 + ω t . 1. Determine the one-step-ahead X (2) prediction of X 3 based on X 1 , X 2 from 3 2. From causality, determine X (2) 3 3. How φ (1) 21 , φ (1) 22 and φ 1 , φ 2 are related ?
PACF Notice that Partial autocorrelation function are, by construction, uncorrelated with h h defined as The partial autocorrelation function (PACF) of a stationary time series X is � � φ 11 = cor X 1 , X 0 = ρ (1) � � X h − X ( h − 1) , X 0 − ˆ X ( h − 1) φ hh = cor for h ≥ 2 , 0 where ˆ X ( h − 1) is the orthogonal projection of X 0 onto span { X 1 , . . . , X h − 1 } . 0 ◮ X h − X ( h − 1) and X 0 − ˆ X ( h − 1) 0 { X 1 , . . . , X h − 1 } , so φ hh is the correlation between X h and X 0 with the linear dependence of X 1 , . . . , X h − 1 on each removed. ◮ The coefficient φ hh is also the last coefficient (i.e. φ (1) hh ) in the best linear one-step-ahead prediction of X h +1 given X 1 , . . . , X h .
Forecasting and PACF of causal AR ( p ) models PACF of an AR ( p ) model Consider the causal AR ( p ) model X t = � p i =1 φ i X t − i + ω t 1. Consider p = 2 and verify that X ( n ) n +1 = φ 1 X n + φ 2 X n − 1 . Deduce the value of the PACF for h > 2 2. In the general case, deduce the value of the PACF for h > p . 1.0 1.0 0.5 0.5 PACF ACF 0.0 0.0 −0.5 −0.5 5 10 15 20 5 10 15 20 lag lag
PACF of invertible MA models Exercise : PACF of a MA(1) model More calculations (see Problem 3.23 in [BD13]) give 2. Deduce the first two values of the PACF. however bounded by a geometrically decreasing function. Consider the invertible MA(1) model X t = ω t + θ ω t − 1 1. Compute ˆ X (2) and ˆ X (2) , the orthogonal projections of X 3 and X 1 onto 3 1 span { X 2 } . φ hh = − ( − θ ) h (1 − θ 2 ) . 1 − θ 2( h +1) In general, the PACF of a MA ( q ) model does not vanish for larger lag, it is
ACF −0.5 0.0 0.5 1.0 5 10 lag 15 20 PACF −0.5 0.0 0.5 1.0 5 10 lag 15 20
An AR(2) model for the recruitement series Recruitment 100 75 50 25 0 1950 1960 1970 1980 Time 1.0 0.5 ACF 0.0 −0.5 0 1 2 3 4 LAG 1.0 0.5 PACF 0.0 −0.5 0 1 2 3 4 LAG
Fig. : Twenty-four month forecats for the Recruitment series shown on slide 11 100 100 80 80 Recruitment Recruitment 60 60 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 40 40 ● ● ● 20 20 ● 0 0 1980 1980 1982 1982 1984 1984 1986 1986 1988 1988 1990 1990 Time Time
ACF and PACF So far, we show that decays decays ARMA decays models. decays PACF ACF Model AR ( p ) zero for h > p MA ( q ) zero for h > q ◮ We can use these results to build a model . ◮ And we know how to forecast in an AR ( p ) model . ◮ It remains to give algorithms that will allow to forecast in MA and ARMA
Innovations n n As a consequence, we can rewrite is t Innovations t calculated iteratively via the innovations algorithm. n +1 as � n j =1 φ ( m ) So far, we have written X n nj X n +1 − j i.e. as the projection of X n +1 onto span { X 1 , . . . , X n } but we clearly have span { X 1 , X 2 − X 1 2 , X 3 − X 2 3 , . . . , X n − X n − 1 } . The values X t − X t − 1 are called the innovations . They verify X t − X t − 1 orthogonal to span { X 1 , . . . , X t − 1 } . � θ nj ( X n +1 − j − X n − j n +1 = n +1 − j ) X n j =1 The one-step-ahead predictors X t t +1 and their mean-squared errors P t t +1 can be
The innovations algorithm P t The one-step-ahead predictors can be iteratively be computed via t The innovations algorithm X 0 1 = 0 , P 0 1 = γ (0) and t = 1 , 2 , . . . � θ tj ( X t +1 − j − X t − j t +1 = t +1 − j ) X t j =1 t − 1 � θ 2 t +1 = γ (0) − t , t − j P j j +1 where j =0 � � h − 1 � h +1 ) − 1 h = 0 , 1 , . . . , t − 1 γ ( t − h ) − θ t , t − h = θ h , h − k θ t , t − k P k ( P h k +1 k =0 1 , θ 11 , P 1 This can be solve by calculting P 0 2 , θ 22 , θ 21 , etc.
Prediction for an MA(1) Exercise Show that n r n with n Consider the MA(1) model X t = ω t + θ ω t − 1 with ω ∼ WN (0 , σ 2 ) . We know that γ (0) = σ 2 (1 + θ 2 ) , γ (1) = θσ 2 and γ ( h ) = 0 for h ≥ 2 . n +1 = θ X n − X n − 1 X n r n = P n − 1 / σ 2 .
see page 175 of [BD13] applied directly to X but to The innovations algorithm for the ARMA ( p , q ) model Consider an ARMA ( p , q ) model Φ( B ) X t = Θ( B ) ω t with ω ∼ WN (0 , σ 2 ) . Let m = max ( p , q ) , to simplify calculations, the innovation algorithm is not � W t = σ − 1 X t t = 1 , . . . , m W t = σ − 1 Φ( B ) X t t > m .
Infinite past lim infinite past We will now show that it is easier for a causal, invertible ARMA process We will define Φ( B ) X t = Θ( B ) ω t to approximate X ( n ) n + h by a truncation of the projection of X n + h onto the ¯ H n = ¯ span { X n , X n − 1 , . . . } = ¯ span { X k , k ≤ n } . The projection onto ¯ H n = ¯ span ( X k , k ≤ n ) can be defined as k →∞ P span ( X n − k ,..., X n ) ˜ X n + h and ˜ ω n + h as the projections of X n + h and ω n + h onto ¯ H n .
Causal and invertible equations, we get (2) Recall (see slide 69) that since X is causal and invertible, we may write (3) ◮ X n + h = � k ≥ 0 ψ k ω n + h − k (MA( ∞ ) representation) ◮ ω n + h = � k ≥ 0 π k X n + h − k (AR(( ∞ )) representation). Now, applying the projection operator onto M n on both sides of both � ˜ X n + h = ψ k ˜ ω n + h − k k ≥ 0 � π k ˜ ω n + h = ˜ X n + h − k . k ≥ 0
We get Iteration � ˜ π k ˜ X n + h = − X n + h − k and k ≥ 1 h − 1 � � X n + h ) 2 � ( X n + h − ˜ = σ 2 ψ 2 j . E j =0 As ˜ X t = X t for all t ≤ n , we can define recursively � ˜ X n +1 = − π k X n + h − k k ≥ 1 � ˜ X n +2 = − π 1 ˜ X n +1 − π k X n + h − k k ≥ 2 . . . .
Truncation n predictor error. can use a truncated version In practice, we do not observe the past from −∞ but only X 1 , . . . , X n , but we � ˜ n +1 = − π k X n + h − k X T k =1 n +1 � ˜ n +2 = − π 1 ˜ n +1 − π k X n + h − k X T X T k =2 . . . � X n + h ) 2 � = σ 2 � h − 1 ( X n + h − ˜ j =0 ψ 2 and E j is used an approximation of the
Chapter 5 : Estimation and model selection
Introduction q of the ARMA model under consideration, we can build predictions and prediction intervals. The aim of this chapter is to present known We saw in the last chapter, that if we know Caution : ◮ the orders ( p and q ) and ◮ the coefficients ◮ methods for estimating the coefficients when the orders ( p and q ) are ◮ model selection methods , i.e. methods for selecting p and q ◮ To avoid confusion, true parameters now wear a star : σ 2 ,⋆ , φ ⋆ 1 , . . . , φ ⋆ p , θ ⋆ 1 , . . . , θ ⋆ ◮ we have a sample ( X 1 , . . . , X n ) to build estimators.
Moment estimations consider causal and invertible ARMA processes of the form Estimation of the mean X n . AR(1) model Give the moment estimators in a stationary AR(1) model. We assume that µ ⋆ = 0 (without loss of generality) in Chapter 4. We now Φ( B )( X t − µ ⋆ ) = Θ( B ) ω t where E ( X t ) = µ ⋆ For a stationary time series, the moment estimator of µ ⋆ is the sample mean ¯
Verify the Yule Walker equation for a causal AR(2) model. p AR(2) and and This leads to where Moment estimators for AR ( p ) models Yule-Walker equations for an AR ( p ) The autocovariance function and parameters of the AR ( p ) model verify Γ p φ ⋆ = γ p σ 2 ,⋆ = γ (0) − ( φ ⋆ ) ⊤ γ p � � � � ⊤ , and γ p = � � ⊤ . 1 ≤ j , k ≤ p φ ⋆ φ ⋆ 1 , . . . , φ ⋆ γ ( k − j ) Γ p = = γ (1) , . . . , γ ( p ) σ 2 = ˆ � Γ − 1 γ (0) − � φ ⊤ � φ = � ˆ p � γ p γ p .
Asymptotics The only case in which the moment method is (asymptotically) efficient is the Asymptotic distribution of moment estimators estimators verify AR ( p ) model. Under mild conditions on ω , and if the AR ( p ) is causal, the Yule-Walker √ n �� φ − φ ⋆ � L → N ( 0 , σ 2 ,⋆ Γ − 1 p ) σ 2 → σ 2 ,⋆ . P ˆ
Likelihood of an causal AR(1) model I We now deal with maximum likelihood estimation, we assume that The likelihood the causal AR(1) model is given by ω ∼ i . i . d . N (0 , σ 2 ,⋆ ) . X t = µ ⋆ + φ ⋆ ( X t − 1 − µ ⋆ ) + ω t L n ( µ, φ, σ 2 ) = f µ,φ,σ 2 ( X 1 , . . . , X n ) = f µ,φ,σ 2 ( X 1 ) f µ,φ,σ 2 ( X 2 | X 1 ) f µ,φ,σ 2 ( X 3 | X 1 , X 2 ) . . . f µ,φ,σ 2 ( X n | X 1 , X 2 , . . . , X n − 1 )
Likelihood of an causal AR(1) model II We can now write the log-likelihood with where n It is straightforward to see that ℓ n ( µ, φ, σ 2 ) = log L n ( µ, φ, σ 2 ) 2 log ( σ 2 ) − 1 1 2 log (1 − φ 2 ) − = − n 2 log (2 π ) + n 2 σ 2 S ( µ, φ ) � � � 2 . S ( µ, φ ) = (1 − φ 2 )( X 1 − µ ) + X k − µ + φ ( X k − 1 − µ ) k =2 σ 2 = 1 µ, ˆ ˆ n S (ˆ φ ) φ = argmin µ,φ log ( S ( µ, φ )/ n ) − 1 µ, ˆ n log (1 − φ 2 ) . ˆ
Recommend
More recommend