Lecture 10 Forecasting and Model Fitting Colin Rundel 02/20/2017 1
Forecasting 2
Forecasting ARMA • Forecasts for stationary models necessarily revert to mean • Differenced models revert to trend (usually a line) • Why? AR gradually damp out, MA terms disappear • Like any other model, accuracy decreases as we extrapolate / prediction interval increases 3 • Remember, E ( y t ) ̸ = δ but rather δ/ ( 1 − ∑ p i = 1 ϕ i ) .
4 One step ahead forecasting Take a fitted ARMA(1,1) process where we know both δ , ϕ , and θ then ˆ y n = δ + ϕ y n − 1 + θ w n − 1 + w n ˆ y n + 1 = δ + ϕ y n + θ w n + w n + 1 ≈ δ + ϕ y n + θ ( y n − ˆ y n ) + 0 ˆ y n + 2 = δ + ϕ y n + 1 + θ w n + 1 + w n + 2 ≈ δ + ϕ ˆ y n + 1 + θ 0 + 0
ARIMA(3,1,1) Example 5
Model Fitting 6
Fitting ARIMA - MLE • Requires that the data be stationary after differencing • Handling d is straight forward, just difference the original data d times t . • To keep things simple we’ll assume w t iid 7 For an ARIMA ( p , d , q ) model (leaving n − d observations) y ′ t = ∆ d y t • After differencing fit an ARMA ( p , q ) model to y ′ ∼ N ( 0 , σ 2 w )
Stationarity & normal errors normal. The joint density of y is given by 1 1 8 If both of these conditions are met, then the time series y t will also be In general, the vector y = ( y 1 , y 2 , . . . , y t ) ′ will have a multivariate normal distribution with mean µ and covariance Σ where Σ ij = Cov ( y t , y t + i − j ) = γ i − j . ( ) 2 ( y − µ ) ′ Σ − 1 ( y − µ ) f y ( y ) = ( 2 π ) t / 2 det ( Σ ) 1 / 2 × exp −
AR 9
10 w use to find the MLE. but not that easy to write down a closed form density which we can then Using these properties it is possible to write down the MVN distribution of y w w , we know Fitting AR ( 1 ) y t = δ + ϕ y t − 1 + w t Need to estimate three parameters: δ , ϕ , and σ 2 δ E ( y t ) = 1 − ϕ σ 2 Var ( y t ) = 1 − ϕ 2 σ 2 1 − ϕ 2 ϕ | h | Cov ( y t , y t + h ) =
Conditional Density w w 2 1 exp w 1 We can rewrite the density as follows, 11 w where, f y = f y t , y t − 1 , ..., y 2 , y 1 = f y t | y t − 1 , ..., y 2 , y 1 f y t − 1 | y t − 2 , ..., y 2 , y 1 · · · f y 2 | y 1 f y 1 = f y t | y t − 1 f y t − 1 | y t − 2 · · · f y 2 | y 1 f y 1 ( σ 2 ) y 1 ∼ N δ, 1 − ϕ 2 y t | y t − 1 ∼ N ( δ + ϕ y t − 1 , σ 2 ) ( y t − δ + ϕ y t − 1 ) 2 ( ) f y t | y t − 1 ( y t ) = − √ σ 2 2 π σ 2
Log likelihood of AR(1) 1 n w 1 2 1 n w 1 2 w 2 1 12 1 2 t 1 w ( ( y t − δ + ϕ y t − 1 ) 2 ) log f y t | y t − 1 ( y t ) = − log 2 π + log σ 2 w + σ 2 ∑ ℓ ( δ, ϕ, σ 2 w ) = log f y = log f y 1 + log f y i | y i − 1 i = 2 w − log ( 1 − ϕ 2 ) + ( 1 − ϕ 2 ) ( ( y 1 − δ ) 2 ) = − log 2 π + log σ 2 σ 2 ( ( y i − δ + ϕ y i − 1 ) 2 ) ∑ − ( n − 1 ) log 2 π + ( n − 1 ) log σ 2 w + σ 2 i = 2 ( ( ( 1 − ϕ 2 )( y 1 − δ ) 2 + ∑ = − n log 2 π + n log σ 2 w − log ( 1 − ϕ 2 ) + ( y i − δ + ϕ y i − 1 ) 2 σ 2 i = 2
13 AR(1) Example with ϕ = − 0 . 75, δ = 0 . 5, and σ 2 w = 1, 6 4 ar1 2 0 0 50 100 150 200 Time
Arima MAE BIC=610.17 ## ## Training set error measures: ## ME RMSE MPE ## AIC=600.28 MAPE ## Training set 0.004616374 1.066741 0.8410635 -327.6919 664.3204 ## MASE ACF1 ## Training set 0.9186983 -0.00776572 AICc=600.4 log likelihood=-297.14 Arima (ar1, order = c (1,0,0)) %>% summary () mean ## Series: ar1 ## ARIMA(1,0,0) with non-zero mean ## ## Coefficients: ## ar1 ## ## sigma^2 estimated as 1.149: 0.7593 1.8734 ## s.e. 0.0454 0.3086 ## 14
lm 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 0.00013 *** ## lag(ar1) 0.7621 0.0461 16.530 < 2e-16 *** ## --- ## Signif. codes: ## 0.1161 ## Residual standard error: 1.074 on 197 degrees of freedom ## (1 observation deleted due to missingness) ## Multiple R-squared: 0.5811, Adjusted R-squared: 0.5789 ## F-statistic: 273.2 on 1 and 197 DF, p-value: < 2.2e-16 3.904 0.4530 lm (ar1~ lag (ar1)) %>% summary () Median ## ## Call: ## lm(formula = ar1 ~ lag(ar1)) ## ## Residuals: ## Min 1Q 3Q ## (Intercept) Max ## -3.1863 -0.7596 0.0779 0.6099 2.8638 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) 15
Bayesian AR(1) Model ## ## } sigma2_w <- 1/tau ## tau ~ dgamma(0.001,0.001) ## phi ~ dnorm(0,1) ## delta ~ dnorm(0,1/1000) ## ## # priors ## mu <- delta/(1-phi) ## } ## model{ ## y_hat[t] ~ dnorm(delta + phi*y[t-1], 1/sigma2_w) ## y[t] ~ dnorm(delta + phi*y[t-1], 1/sigma2_w) ## for (t in 2:length(y)) { ## ## y_hat[1] ~ dnorm(delta/(1-phi), (sigma2_w/(1-phi^2))^-1) ## y[1] ~ dnorm(delta/(1-phi), (sigma2_w/(1-phi^2))^-1) ## ## # likelihood 16
Posteriors 17 delta phi sigma2_w 7.5 param density 5.0 delta phi sigma2_w 2.5 0.0 0.00 0.25 0.50 0.75 0.6 0.7 0.8 0.9 0.8 1.0 1.2 1.4 1.6 1.8 value
Random Walk with Drift 18 with ϕ = 1, δ = 0 . 1, and σ 2 w = 1 using the same models rwd 50 30 10 0 0 100 200 300 400 500 1.0 1.0 0.8 0.8 0.6 0.6 PACF ACF 0.4 0.4 0.2 0.2 0.0 0.0 0 5 10 15 20 25 0 5 10 15 20 25 Lag Lag
lm 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 0.221 ## lag(rwd) 1.001406 0.002632 380.494 <2e-16 *** ## --- ## Signif. codes: ## 0.068588 ## Residual standard error: 1.004 on 498 degrees of freedom ## (1 observation deleted due to missingness) ## Multiple R-squared: 0.9966, Adjusted R-squared: 0.9966 ## F-statistic: 1.448e+05 on 1 and 498 DF, p-value: < 2.2e-16 1.224 ## (Intercept) 0.083981 lm (rwd~ lag (rwd)) %>% summary () Median ## ## Call: ## lm(formula = rwd ~ lag(rwd)) ## ## Residuals: ## Min 1Q 3Q Estimate Std. Error t value Pr(>|t|) Max ## -2.83634 -0.71725 0.00629 0.69476 3.13117 ## ## Coefficients: ## 19
Arima MAE BIC=1455.31 ## ## Training set error measures: ## ME RMSE MPE MAPE ## AIC=1442.66 MASE ## Training set 0.1041264 1.008427 0.8142404 -Inf Inf 0.9996364 ## ACF1 ## Training set 0.01365841 AICc=1442.7 log likelihood=-718.33 Arima (rwd, order = c (1,0,0), include.constant = TRUE) %>% summary () mean ## Series: rwd ## ARIMA(1,0,0) with non-zero mean ## ## Coefficients: ## ar1 ## ## sigma^2 estimated as 1.021: 0.9992 26.4894 ## s.e. 0.0010 23.5057 ## 20
Bayesian Posteriors 21 delta phi sigma2_w 6 300 9 4 param density delta 200 6 phi sigma2_w 2 100 3 0 0 0 −0.05 0.00 0.05 0.10 0.15 0.20 0.992 0.994 0.996 0.998 1.000 0.8 0.9 1.0 1.1 1.2 1.3 value
Non-stationary Bayesian Model ## ## } sigma2_w <- 1/tau ## tau ~ dgamma(0.001,0.001) ## phi ~ dnorm(0,1) ## delta ~ dnorm(0,1/1000) ## ## # priors ## mu <- delta/(1-phi) ## } ## model{ ## y_hat[t] ~ dnorm(delta + phi*y[t-1], 1/sigma2_w) ## y[t] ~ dnorm(delta + phi*y[t-1], 1/sigma2_w) ## for (t in 2:length(y)) { ## ## #y_hat[1] ~ dnorm(delta/(1-phi), (sigma2_w/(1-phi^2))^-1) ## #y[1] ~ dnorm(delta/(1-phi), (sigma2_w/(1-phi^2))^-1) ## ## # likelihood 22
NS Bayesian Posteriors 23 delta phi sigma2_w 6 150 6 4 100 param 4 density delta phi sigma2_w 2 50 2 0 0 0 −0.1 0.0 0.1 0.2 0.3 0.995 1.000 1.005 1.010 0.8 0.9 1.0 1.1 1.2 1.3 value
Probability of being stationary rwd_params$phi %>% abs () %>% {. < 1} %>% { sum (.) / length (.)} ## [1] 0.3046 24
Correct ARIMA BIC=1433.69 ## Training set 0.01027574 ACF1 ## Inf 0.9922597 ## Training set -2.228961e-07 1.001325 0.8082318 -Inf MASE MPE MAPE MAE RMSE ME ## ## Training set error measures: ## AICc=1425.29 Arima (rwd, order = c (0,1,0), include.constant = TRUE) %>% summary () ## AIC=1425.26 log likelihood=-710.63 ## sigma^2 estimated as 1.007: ## 0.0448 ## s.e. 0.1117 ## drift ## ## Coefficients: ## ## ARIMA(0,1,0) with drift ## Series: rwd 25
Regressing y t on y t 1 gets us an approximate solution, but it ignores the f y 1 y 2 • If p is not much smaller than n then probably a lot Fitting AR(p) We can rewrite the density as follows, p y t y p part of the likelihood. How much does this matter (vs. using the full likelihood)? • If p n then probably not much 26 f ( y ) = f ( y 1 , y 2 , . . . , y t − 1 , y t ) = f ( y 1 , y 2 , . . . , y p ) f ( y p + 1 | y 1 , . . . , y p ) · · · f ( y n | y n − p , . . . , y n − 1 )
Fitting AR(p) We can rewrite the density as follows, How much does this matter (vs. using the full likelihood)? 26 f ( y ) = f ( y 1 , y 2 , . . . , y t − 1 , y t ) = f ( y 1 , y 2 , . . . , y p ) f ( y p + 1 | y 1 , . . . , y p ) · · · f ( y n | y n − p , . . . , y n − 1 ) Regressing y t on y t − p , . . . , y t − 1 gets us an approximate solution, but it ignores the f ( y 1 , y 2 , . . . , y p ) part of the likelihood. • If p is not much smaller than n then probably a lot • If p << n then probably not much
ARMA 27
Recommend
More recommend