Lecture 6 Discrete Time Series 9/21/2018 1
Discrete Time Series
Stationary Processes A stocastic process (i.e. a time series) is considered to be strictly stationary if the properties of the process are not changed by a shift in origin. In the time series context this means that the joint distribution of {𝑧 𝑢 1 , … , 𝑧 𝑢 𝑜 } must be identical to the distribution of {𝑧 𝑢 1 +𝑙 , … , 𝑧 𝑢 𝑜 +𝑙 } for any value of 𝑜 and 𝑙 . 2
Stationary Processes A stocastic process (i.e. a time series) is considered to be strictly stationary if the properties of the process are not changed by a shift in origin. In the time series context this means that the joint distribution of {𝑧 𝑢 1 , … , 𝑧 𝑢 𝑜 } must be identical to the distribution of {𝑧 𝑢 1 +𝑙 , … , 𝑧 𝑢 𝑜 +𝑙 } for any value of 𝑜 and 𝑙 . 2
Weak Stationary Strict stationary is unnecessarily strong / restrictive for many applications, so instead we often opt for weak stationary which requires the following, 1. The process has finite variance 𝐹(𝑧 2 2. The mean of the process is constant 𝐹(𝑧 𝑢 ) = 𝜈 for all 𝑢 3. The second moment only depends on the lag 𝐷𝑝𝑤(𝑧 𝑢 , 𝑧 𝑡 ) = 𝐷𝑝𝑤(𝑧 𝑢+𝑙 , 𝑧 𝑡+𝑙 ) for all 𝑢, 𝑡, 𝑙 When we say stationary in class we will almost always mean weakly stationary . 3 𝑢 ) < ∞ for all 𝑢
Weak Stationary Strict stationary is unnecessarily strong / restrictive for many applications, so instead we often opt for weak stationary which requires the following, 1. The process has finite variance 𝐹(𝑧 2 2. The mean of the process is constant 𝐹(𝑧 𝑢 ) = 𝜈 for all 𝑢 3. The second moment only depends on the lag 𝐷𝑝𝑤(𝑧 𝑢 , 𝑧 𝑡 ) = 𝐷𝑝𝑤(𝑧 𝑢+𝑙 , 𝑧 𝑡+𝑙 ) for all 𝑢, 𝑡, 𝑙 When we say stationary in class we will almost always mean weakly stationary . 3 𝑢 ) < ∞ for all 𝑢
𝛿 𝑙 = 𝛿(𝑢, 𝑢 + 𝑙) = 𝐷𝑝𝑤(𝑧 𝑢 , 𝑧 𝑢+𝑙 ) = 𝛿(𝑙) 𝜍 𝑙 = Autocorrelation as 𝛿(0) √𝛿(𝑢, 𝑢)𝛿(𝑢 + 𝑙, 𝑢 + 𝑙) 𝛿(𝑢, 𝑢 + 𝑙) this is also sometimes written in terms of the autocovariance function ( 𝛿 𝑙 ) 𝜏 2 √𝑊 𝑏𝑠(𝑧 𝑢 )𝑊 𝑏𝑠(𝑧 𝑢+𝑙 ) 𝐷𝑝𝑤(𝑧 𝑢 , 𝑧 𝑢+𝑙 ) = we define the autocorrelation at lag 𝑙 as 4 For a stationary time series, where 𝐹(𝑧 𝑢 ) = 𝜈 and Var (𝑧 𝑢 ) = 𝜏 2 for all 𝑢 , 𝜍 𝑙 = 𝐷𝑝𝑠(𝑧 𝑢 , 𝑧 𝑢+𝑙 ) = 𝐹 ((𝑧 𝑢 − 𝜈)(𝑧 𝑢+𝑙 − 𝜈))
Autocorrelation 𝜏 2 𝛿(0) √𝛿(𝑢, 𝑢)𝛿(𝑢 + 𝑙, 𝑢 + 𝑙) 𝛿(𝑢, 𝑢 + 𝑙) as this is also sometimes written in terms of the autocovariance function ( 𝛿 𝑙 ) √𝑊 𝑏𝑠(𝑧 𝑢 )𝑊 𝑏𝑠(𝑧 𝑢+𝑙 ) 𝐷𝑝𝑤(𝑧 𝑢 , 𝑧 𝑢+𝑙 ) = we define the autocorrelation at lag 𝑙 as 4 For a stationary time series, where 𝐹(𝑧 𝑢 ) = 𝜈 and Var (𝑧 𝑢 ) = 𝜏 2 for all 𝑢 , 𝜍 𝑙 = 𝐷𝑝𝑠(𝑧 𝑢 , 𝑧 𝑢+𝑙 ) = 𝐹 ((𝑧 𝑢 − 𝜈)(𝑧 𝑢+𝑙 − 𝜈)) 𝛿 𝑙 = 𝛿(𝑢, 𝑢 + 𝑙) = 𝐷𝑝𝑤(𝑧 𝑢 , 𝑧 𝑢+𝑙 ) = 𝛿(𝑙) 𝜍 𝑙 =
Covariance Structure ⋮ 𝛿(1) 𝛿(0) ⋯ 𝛿(𝑜 − 4) 𝛿(𝑜 − 3) 𝛿(𝑜 − 2) 𝛿(𝑜 − 1) ⋮ 𝛿(𝑜 − 1) ⋱ ⋮ ⋮ ⋮ ⋮ 𝛿(𝑜 − 3) 𝛿(𝑜 − 4) 𝛿(𝑜) 𝛿(𝑜 − 2) 𝛿(0) ⎟ where 𝑄 𝑢,𝑙 (𝑧) is the project of 𝑧 onto the space spanned by 𝑧 𝑢+1 , … , 𝑧 𝑢+𝑙−1 . ⎠ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ 𝛿(𝑜 − 3) ⎟ ⎟ ⎟ ⎟ ⎞ 𝛿(0) 𝛿(1) ⋯ ⋯ 𝛿(1) Based on our definition of a (weakly) stationary process, it implies a ⎜ 𝛿(0) ⎝ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ 𝛿(2) ⎜ ⎜ ⎜ ⎜ ⎛ 𝚻 = covariance of the following structure, 𝛿(1) 𝛿(3) 𝛿(2) 𝛿(2) 𝛿(3) 𝛿(𝑜 − 2) 𝛿(𝑜 − 3) ⋯ 𝛿(1) 𝛿(0) 𝛿(1) 𝛿(𝑜 − 1) ⋯ 𝛿(𝑜 − 2) ⋯ 𝛿(2) 𝛿(1) 𝛿(0) 𝛿(1) 𝛿(𝑜) 𝛿(𝑜 − 1) 5
Example - Random walk 6 Let 𝑧 𝑢 = 𝑧 𝑢−1 + 𝑥 𝑢 with 𝑧 0 = 0 and 𝑥 𝑢 ∼ 𝒪(0, 1) . Random walk 10 y 0 −10 0 250 500 750 1000 t
ACF + PACF 7 rw$y 10 0 −10 0 200 400 600 800 1000 1.00 1.00 0.75 0.75 PACF ACF 0.50 0.50 0.25 0.25 0.00 0.00 0 10 20 30 40 50 0 10 20 30 40 50 Lag Lag
Stationary? 8 Is 𝑧 𝑢 stationary?
Partial Autocorrelation - pACF Given these type of patterns in the autocorrelation we often want to This is done through the calculation of a partial autocorrelation ( 𝛽(𝑙) ), which is defined as follows: 𝛽(0) = 1 𝛽(1) = 𝜍(1) = 𝐷𝑝𝑠(𝑧 𝑢 , 𝑧 𝑢+1 ) ⋮ 9 examine the relationship between 𝑧 𝑢 and 𝑧 𝑢+𝑙 with the (linear) dependence of 𝑧 𝑢 on 𝑧 𝑢+1 through 𝑧 𝑢+𝑙−1 removed. 𝛽(𝑙) = 𝐷𝑝𝑠(𝑧 𝑢 − 𝑄 𝑢,𝑙 (𝑧 𝑢 ), 𝑧 𝑢+𝑙 − 𝑄 𝑢,𝑙 (𝑧 𝑢+𝑙 ))
Example - Random walk with drift 10 Let 𝑧 𝑢 = 𝜀 + 𝑧 𝑢−1 + 𝑥 𝑢 with 𝑧 0 = 0 and 𝑥 𝑢 ∼ 𝒪(0, 1) . Random walk with trend 80 60 40 y 20 0 0 250 500 750 1000 t
ACF + PACF 11 rwt$y 80 60 40 20 0 0 200 400 600 800 1000 1.00 1.00 0.75 0.75 PACF ACF 0.50 0.50 0.25 0.25 0.00 0.00 0 10 20 30 40 50 0 10 20 30 40 50 Lag Lag
Stationary? 12 Is 𝑧 𝑢 stationary?
Example - Moving Average 13 Let 𝑥 𝑢 ∼ 𝒪(0, 1) and 𝑧 𝑢 = 𝑥 𝑢−1 + 𝑥 𝑢 . Moving Average 3 2 1 y 0 −1 −2 0 25 50 75 100 t
ACF + PACF 14 ma$y 3 2 1 0 −1 −2 0 20 40 60 80 100 0.25 0.25 PACF ACF 0.00 0.00 −0.25 −0.25 0 10 20 30 40 50 0 10 20 30 40 50 Lag Lag
Stationary? 15 Is 𝑧 𝑢 stationary?
16 Autoregressive Let 𝑥 𝑢 ∼ 𝒪(0, 1) and 𝑧 𝑢 = 𝑧 𝑢−1 − 0.9𝑧 𝑢−2 + 𝑥 𝑢 with 𝑧 𝑢 = 0 for 𝑢 < 1 . Autoregressive 4 y 0 −4 0 100 200 300 400 500 t
ACF + PACF 17 ar$y 4 0 −4 0 100 200 300 400 500 0.5 0.5 PACF 0.0 0.0 ACF −0.5 −0.5 0 10 20 30 40 50 0 10 20 30 40 50 Lag Lag
Example - Australian Wine Sales ## ## # ... with 166 more rows ## 10 1981. 22591 9 1981. 21133 ## 8 1981. 23739 ## 7 1980. 22893 ## 6 1980. 19227 ## 5 1980. 18019 ## 4 1980. 17708 3 1980. 20016 Australian total wine sales by wine makers in bottles <= 1 litre. Jan 1980 – ## 2 1980. 16733 ## 15136 1 1980 ## <dbl> <dbl> ## date sales ## ## # A tibble: 176 x 2 aus_wine Aug 1994. 18 aus_wine = readRDS (”../data/aus_wine.rds”)
19 Time series 40000 30000 sales 20000 1980 1985 1990 1995 date
Basic Model Fit 20 40000 30000 model sales linear quadratic 20000 1980 1985 1990 1995 date
Residuals 21 lin_resid 15000 10000 5000 0 −5000 −10000 type residual lin_resid quad_resid 15000 quad_resid 10000 5000 0 −5000 −10000 1980 1985 1990 1995 date
Autocorrelation Plot 22 d$quad_resid 10000 5000 0 −5000 −10000 0 50 100 150 0.5 0.5 PACF ACF 0.0 0.0 0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35 Lag Lag
23 lag1 lag2 lag3 lag4 10000 5000 0 −5000 −10000 lag5 lag6 lag7 lag8 10000 quad_resid 5000 0 −5000 −10000 lag9 lag10 lag11 lag12 10000 5000 0 −5000 −10000 −10000 −5000 0 5000 10000 −10000 −5000 0 5000 10000 −10000 −5000 0 5000 10000 −10000 −5000 0 5000 10000 lag_value
Auto regressive errors 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 0.679 ## lag_12 0.89024 0.04045 22.006 <2e-16 *** ## --- ## Signif. codes: ## 201.58416 ## Residual standard error: 2581 on 162 degrees of freedom ## (12 observations deleted due to missingness) ## Multiple R-squared: 0.7493, Adjusted R-squared: 0.7478 ## F-statistic: 484.3 on 1 and 162 DF, p-value: < 2.2e-16 0.415 83.65080 ## 3Q ## Call: ## lm(formula = quad_resid ~ lag_12, data = d_ar) ## ## Residuals: ## Min 1Q Median Max ## (Intercept) ## -12286.5 -1380.5 73.4 1505.2 7188.1 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) 24
Residual residuals 25 5000 0 resid −5000 −10000 1980 1985 1990 1995 date
Residual residuals - acf 26 l_ar$residuals 5000 0 −5000 −10000 0 50 100 150 0.2 0.2 0.1 0.1 0.0 PACF 0.0 ACF −0.1 −0.1 −0.2 −0.2 0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35 Lag Lag
27 lag1 lag2 lag3 lag4 5000 0 −5000 −10000 lag5 lag6 lag7 lag8 5000 0 resid −5000 −10000 lag9 lag10 lag11 lag12 5000 0 −5000 −10000 −10000 −5000 0 5000 −10000 −5000 0 5000 −10000 −5000 0 5000 −10000 −5000 0 5000 lag_value
sales (𝑢) = 𝛾 0 + 𝛾 1 𝑢 + 𝛾 2 𝑢 2 + 𝑥 𝑢 𝑥 𝑢 = 𝜀 𝑥 𝑢−12 + 𝜗 𝑢 Writing down the model? So, is our EDA suggesting that we fit the following model? the model we actually fit is, where 28 sales (𝑢) = 𝛾 0 + 𝛾 1 𝑢 + 𝛾 2 𝑢 2 + 𝛾 3 sales (𝑢 − 12) + 𝜗 𝑢
Recommend
More recommend