Forecasting in R Evaluating modeling accuracy Bahman Rostami-Tabar
Outline 1 Residual diagnostics 2 Evaluating point forecast accuracy 3 Time Series Cross Validation (TSCV) 4 Time series cross validation 5 Evaluating prediction interval accuracy 6 Lab session 6 2
Outline 1 Residual diagnostics 2 Evaluating point forecast accuracy 3 Time Series Cross Validation (TSCV) 4 Time series cross validation 5 Evaluating prediction interval accuracy 6 Lab session 6 3
Forecasting residuals Residuals in forecasting: difference between observed value and its fitted value: e t = y t − ˆ y t | t − 1 . 4
Forecasting residuals Residuals in forecasting: difference between observed value and its fitted value: e t = y t − ˆ y t | t − 1 . Assumptions { e t } uncorrelated. If they aren’t, then information left in 1 residuals that should be used in computing forecasts. { e t } have mean zero. If they don’t, then forecasts are 2 biased. 4
Forecasting residuals Residuals in forecasting: difference between observed value and its fitted value: e t = y t − ˆ y t | t − 1 . Assumptions { e t } uncorrelated. If they aren’t, then information left in 1 residuals that should be used in computing forecasts. { e t } have mean zero. If they don’t, then forecasts are 2 biased. Useful properties (for prediction intervals) { e t } have constant variance. 3 { e t } are normally distributed. 4 4
Example: Antidiabetic drug sales Antidiabetic drug sales 30 25 Sales (US$) colour 20 Data Fitted 15 10 2000 Jan 2002 Jan 2004 Jan 2006 Jan 2008 Jan Month 5
Example: Antidiabetic drug sales augment (fit) %>% autoplot (.resid) + xlab ("Month") + ylab ("") + ggtitle ("Residuals from naïve method") Residuals from naïve method 5 0 −5 −10 2000 Jan 2002 Jan 2004 Jan 2006 Jan 2008 Jan Month 6
Example: Antidiabetic drug sales augment (fit) %>% ggplot ( aes (x = .resid)) + geom_histogram (bins = 30) + ggtitle ("Histogram of residuals") Histogram of residuals 60 40 count 20 0 0 50 100 .resid 7
Example: Antidiabetic drug sales augment (fit) %>% ACF (.resid) %>% autoplot () + ggtitle ("ACF of residuals") ACF of residuals 0.10 0.05 acf 0.00 −0.05 −0.10 5 10 15 20 lag [1] 8
ACF of residuals We assume that the residuals are white noise (uncorrelated, mean zero, constant variance). If they aren’t, then there is information left in the residuals that should be used in computing forecasts. So a standard residual diagnostic is to check the ACF of the residuals of a forecasting method. We expect these to look like white noise. 9
Portmanteau tests Consider a whole set of r k values, and develop a test to see whether the set is significantly different from a zero set. 10
Portmanteau tests Consider a whole set of r k values, and develop a test to see whether the set is significantly different from a zero set. Box-Pierce test h � r 2 Q = T k k =1 where h is max lag being considered and T is number of observations. If each r k close to zero, Q will be small . If some r k values large (positive or negative), Q 10 will be large .
Portmanteau tests Consider a whole set of r k values, and develop a test to see whether the set is significantly different from a zero set. Ljung-Box test h Q ∗ = T ( T + 2) ( T − k ) − 1 r 2 � k k =1 where h is max lag being considered and T is number of observations. Preferences: h = 10 for non-seasonal data, h = 2 m for seasonal data. 11 Better performance, especially in small samples.
Portmanteau tests If data are WN, Q ∗ has χ 2 distribution with ( h − K ) degrees of freedom where K = no. parameters in model. When applied to raw data, set K = 0. augment (fit) %>% features (.resid, ljung_box, lag=10,dof=0) ## # A tibble: 1 x 4 ## Symbol .model lb_stat lb_pvalue ## <chr> <chr> <dbl> <dbl> ## 1 GOOG NAIVE(Close) 7.91 0.637 12
gg_tsresiduals function fit %>% gg_tsresiduals () 50 .resid 0 0 50 100 150 200 250 trading_day 50 0.10 40 0.05 30 count acf 0.00 20 −0.05 10 −0.10 0 5 10 15 20 0 50 100 lag [1] .resid 13
Outline 1 Residual diagnostics 2 Evaluating point forecast accuracy 3 Time Series Cross Validation (TSCV) 4 Time series cross validation 5 Evaluating prediction interval accuracy 6 Lab session 6 14
Evaluating point forecast accuracy 15
Evaluate forecast accuracy Residual diagnostic is not a reliable indication of forecast accuracy A model which fits the training data well will not necessarily forecast well A perfect fit can always be obtained by using a model with enough parameters Over-fitting a model to data is just as bad as failing to identify a systematic pattern in the data 16
Fitting 17
Evaluate forecast accuracy The accuracy of forecasts can only be determined by considering how well a model performs on new data that were not used when fitting the model 18
Forecast accuracy evaluation using test sets We mimic the real life situation We pretend we don’t know some part of data(new data) It must not be used for any aspect of model training Forecast accuracy is based only on the test set 19
Outline 1 Residual diagnostics 2 Evaluating point forecast accuracy 3 Time Series Cross Validation (TSCV) 4 Time series cross validation 5 Evaluating prediction interval accuracy 6 Lab session 6 20
Training and test series 21
Split the data Use functions in dplyr and lubridate such as filter , filter_index , slice , year # Filter the year of interest antidiabetic_drug_sale %>% filter_index ("2006" ~ .) ## # A tsibble: 30 x 2 [1M] ## Month Cost ## <mth> <dbl> ## 1 2006 Jan 23.5 ## 2 2006 Feb 12.5 ## 3 2006 Mar 15.5 22 ## 4 2006 Apr 14.2 ## 5 2006 May 17.8
Forecast errors Forecast “error”: the difference between an observed value and its forecast e T + h = y T + h − ˆ y T + h | T , where the training data is given by { y 1 , . . . , y T } Unlike residuals, forecast errors on the test set involve multi-step forecasts. These are true forecast errors as the test data is not used in computing ˆ y T + h | T . 23
Measures of forecast accuracy ( T + h )th observation, h = 1 , . . . , H y T + h = ˆ y T + h | T = its forecast based on data up to time T . y T + h − ˆ e T + h = y T + h | T MAE = mean( | e T + h | ) � MSE = mean( e 2 mean( e 2 T + h ) RMSE = T + h ) MAPE = 100mean( | e T + h | / | y T + h | ) 24
Measures of forecast accuracy ( T + h )th observation, h = 1 , . . . , H y T + h = ˆ y T + h | T = its forecast based on data up to time T . y T + h − ˆ e T + h = y T + h | T MAE = mean( | e T + h | ) � MSE = mean( e 2 mean( e 2 T + h ) RMSE = T + h ) MAPE = 100mean( | e T + h | / | y T + h | ) MAE, MSE, RMSE are all scale dependent MAPE is scale independent but is only sensible if y t ≫ 0 for all t , and y has a natural zero. 24
Measures of forecast accuracy Mean Absolute Scaled Error MASE = mean( | e T + h | / Q ) where Q is a stable measure of the scale of the time series { y t } . For non-seasonal time series, Q = ( T − 1) − 1 T � | y t − y t − 1 | t =2 works well. Then MASE is equivalent to MAE relative to a naïve method. 25
Measures of forecast accuracy Mean Absolute Scaled Error MASE = mean( | e T + h | / Q ) where Q is a stable measure of the scale of the time series { y t } . For seasonal time series, T Q = ( T − m ) − 1 � | y t − y t − m | t = m +1 works well. Then MASE is equivalent to MAE relative to a seasonal naïve method. 26
Poll: true or false? Good point forecast models should have 1 normally distributed residuals. A model with small residuals will give good 2 forecasts. The best measure of forecast accuracy is MAPE. 3 Always choose the model with the best forecast 4 accuracy as measured on the test set. 27
Outline 1 Residual diagnostics 2 Evaluating point forecast accuracy 3 Time Series Cross Validation (TSCV) 4 Time series cross validation 5 Evaluating prediction interval accuracy 6 Lab session 6 28
Issue with traditional train/test split 29
Issue with traditional train/test split Training data Test data time 29
Time series cross-validation 30
Time series cross-validation Time series cross-validation time 31
Time series cross-validation Time series cross-validation time Forecast accuracy averaged over test sets. Also known as “evaluation on a rolling forecasting origin” 31
Creating the rolling training sets There are three main rolling types which can be used. Stretch: extends a growing length window with new data. Slide: shifts a fixed length window through the data. Tile: moves a fixed length window without overlap. Three functions to roll a tsibble: stretch_tsibble() , slide_tsibble() , and tile_tsibble() . For time series cross-validation, stretching windows are most commonly used. 32
Creating the rolling training sets Slide 800 700 600 500 400 Tile 800 700 Trips 600 500 400 Stretch 800 700 600 500 400 2000 2005 2010 2015 Quarter 33
Recommend
More recommend