ENSEMBLES FOR TIME SERIES FORECASTING Mariana Oliveira & Luís Torgo
Ensembles for Time Series Forecasting ACML 2014 2 Mariana Oliveira & Luís Torgo (FCUP/LIAAD) Outline • Introduction • Delay-coordinate embedding • Bagging for Time Series Forecasting • Bagging Variants • Experimental Evaluation • Time series • Results • Conclusion • Future work
Ensembles for Time Series Forecasting ACML 2014 3 Mariana Oliveira & Luís Torgo (FCUP/LIAAD) Introduction • Ensembles are among the most competitive forms of solving predictive tasks; • Diversity among ensemble members is essential; • We aim at improving the predictive performance of ensembles in time series forecasting .
Ensembles for Time Series Forecasting ACML 2014 4 Mariana Oliveira & Luís Torgo (FCUP/LIAAD) Delay-coordinate embedding • Delay-coordinate embedding assumes that future values of the series are only dependent on a limited number of previous values; • Any regression tool can then be used to obtain a model of the form 𝑍 𝑢+ℎ = 𝑔 < 𝑍 𝑢−𝑙 , … , 𝑍 𝑢−1 , 𝑍 𝑢 > . • This requires setting the embed size (k) and most times there may not exist one single correct answer.
Ensembles for Time Series Forecasting ACML 2014 5 Mariana Oliveira & Luís Torgo (FCUP/LIAAD) Delay-coordinate embedding 7 7 6 5 5 y 4 4 k=3 3 3 2 2 1 1 1 2 3 4 5 6 t-3 t-2 t-1 t time 2 4 1 5 7 2 4 1 3 7 2 4
Ensembles for Time Series Forecasting ACML 2014 6 Mariana Oliveira & Luís Torgo (FCUP/LIAAD) Bagging for Time Series Forecasting • We propose variants of bagging of regression trees; • Diversity generation of our variants explores specific properties of time series prediction tasks; • We will compare the performance of our proposals against that of standard bagging, our baseline.
Ensembles for Time Series Forecasting ACML 2014 7 Mariana Oliveira & Luís Torgo (FCUP/LIAAD) Bagging for Time Series Forecasting • There are many possible ways of describing the recent dynamics of a time series through a set of predictors; • Our initial set of proposed bagging variants use • different embed sizes given a maximum embed size k max ; • summary statistics of recent values as additional predictors .
Ensembles for Time Series Forecasting ACML 2014 8 Mariana Oliveira & Luís Torgo (FCUP/LIAAD) Bagging variants μ σ 2 t t-1 t-2 ... t-k t t-1 t-2 ... t-k
Ensembles for Time Series Forecasting ACML 2014 9 Mariana Oliveira & Luís Torgo (FCUP/LIAAD) Bagging variants t t-1 t-2 t-3 t-4 t-5 t-6 ... t-k t t-1 t-2 ... t-k/2 t ... t-k/4
Ensembles for Time Series Forecasting ACML 2014 10 Mariana Oliveira & Luís Torgo (FCUP/LIAAD) Bagging variants μ σ 2 t t-1 t-2 t-3 t-4 t-5 t-6 ... t-k μ σ 2 t ... t-k/4 μ σ 2 t t-1 t-2 ... t-k/2
Ensembles for Time Series Forecasting ACML 2014 11 Mariana Oliveira & Luís Torgo (FCUP/LIAAD) Bagging variants t t-1 t-2 t-3 t-4 t-5 t-6 ... t-k μ σ 2 t t-1 t-2 t-3 t-4 t-5 t-6 ... t-k t ... t-k/4 μ σ 2 t ... t-k/4 t t-1 t-2 ... t-k/2 μ σ 2 t t-1 t-2 ... t-k/2
Ensembles for Time Series Forecasting ACML 2014 12 Mariana Oliveira & Luís Torgo (FCUP/LIAAD) Experimental Evaluation • Data: 14 real world time series; • Metric: Standard Mean Squared Error (MSE); • Experimental procedure: Monte Carlo simulations • randomly selected 10 points in time • training on the previous 50% observations • testing on the following 25%; • Statistical Significance: • Wilcoxon signed rank tests with p-value < 0:05; • Tested setups: • Different number of models in the ensemble ( M ); • Difference value of the maximum embed used ( k max );
Ensembles for Time Series Forecasting ACML 2014 13 Mariana Oliveira & Luís Torgo (FCUP/LIAAD) Time series We use the series of the differences between successive values of each original time series; Each series was treated separately from the others in their respective data source.
Ensembles for Time Series Forecasting ACML 2014 14 Mariana Oliveira & Luís Torgo (FCUP/LIAAD) Results Paired comparisons: Nr.Wins (Statistically Significant Wins)/ Nr.Losses (Statistically Significant Losses)
Ensembles for Time Series Forecasting ACML 2014 15 Mariana Oliveira & Luís Torgo (FCUP/LIAAD) Results Average and standard deviation of rank for each method
Ensembles for Time Series Forecasting ACML 2014 16 Mariana Oliveira & Luís Torgo (FCUP/LIAAD) Results Average and standard deviation of mean percentual difference wrt to the baseline
Ensembles for Time Series Forecasting ACML 2014 17 Mariana Oliveira & Luís Torgo (FCUP/LIAAD) Results 100. 𝑁𝑇𝐹 𝑦 −𝑁𝑇𝐹 𝐹 sgn 𝑁𝑇𝐹 𝑦 −𝑁𝑇𝐹 𝐹 . log + 1 𝑁𝑇𝐹 𝐹
Ensembles for Time Series Forecasting ACML 2014 18 Mariana Oliveira & Luís Torgo (FCUP/LIAAD) Conclusion • Proposed initial set of forms of injecting diversity into ensembles that take into account specific challenges posed by time series; • The recent dynamics of a time series is represented using • different embed sizes and • the addition of variables summarizing the recent observed values; • This was implemented and tested in the context of bagging regression trees, obtaining a clear advantage over standard bagging in real world data; • Our results suggest this is a promising research direction.
Ensembles for Time Series Forecasting ACML 2014 19 Mariana Oliveira & Luís Torgo (FCUP/LIAAD) Future work • Exploring the possibility of • changing the amount of past data used by each model (varying training windows); • making the aggregation of the predictions time-dependent; • using other types of predictor variants. Try it yourself: • All code and data necessary to replicate all the results presented available at http://www.dcc.fc.up.pt/~ltorgo/ACML2014/ • All programs are written in the free and open source R software environment.
Recommend
More recommend