The Effectiveness of Discretization in Forecasting: An Empirical Study on Neural Time Series Models 6 th Workshop on Mining and Learning from Time Series @ KDD 2020 Stephan Rabanser 1* stephan.rabanser@mail.utoronto.ca Tim Januschowski 2 tjnsch@amazon.com Valentin Flunkert 2 flunkert@amazon.com David Salinas 3 david.salinas@naverlabs.com Jan Gasthaus 2 gasthaus@amazon.com 1 University of Toronto 2 Amazon 3 NAVER LABS Vector Institute AWS AI Labs Europe Europe *Work done at AWS AI Labs August 24, 2020
Motivation & Setup • Recent advancements in global forecasting: z i , T i +1: T i + τ ψ Likelihood model architectures and probabilistic outputs. • We investigate effects of (discrete) I/O Distribution representations. Model T context range i prediction range ( τ ) multiple φ prediction sampling sample paths windows z i , 1: T i x i , 1: T i + τ z i • φ : input transformation. x i • ψ : output transformation, influences output distribution. The Effectiveness of Discretization in Forecasting 2 Europe
Scaling Problem: A Motivating Example ( m4 hourly ) Original time series Time series after scaling Time series after q-transform 1000 40000 4 800 30000 3 600 20000 2 400 10000 1 200 0 0 0 0 200 400 600 0 200 400 600 0 200 400 600 1 . 0 1 . 0 1 . 0 0 . 8 0 . 8 0 . 8 0 . 6 0 . 6 0 . 6 0 . 4 0 . 4 0 . 4 0 . 2 0 . 2 0 . 2 0 . 0 0 . 0 0 . 0 0 200000 400000 600000 0 10 20 0 250 500 750 1000 The Effectiveness of Discretization in Forecasting 3 Europe
Continuous Transforms Addressing the scaling problem in global forecasting is of utmost importance! Scaling Probability Integral Transform ( pit ) Apply an affine transformation to each Maps a RV Z through its CDF: time series: • Y = F Z ( Z ) with Y being uniform. • General form: z ′ i , t = ( z i , t − b i ) / a i . • Data preprocessing: make the • Classic mean scaling ( ms ): empirical marginal of each time series � T i • a i = 1 approximately uniform [3]. t =1 | z i , t | T i • b i = 0 i , t = ˆ F i ( z i , t ) with ˆ • z ′ F i being the ECDF • Lots of possible variations ... for time series z i , 1: T i . The Effectiveness of Discretization in Forecasting 4 Europe
Discretizing Transforms • Binning function b : R → { 1 , 2 , . . . , B } mapping a real input to a discrete output. • Each b ∈ { 1 , . . . , B } is tied to a bucket S b = [ l b − 1 , l b ): b ( z ) = b iff z ∈ S b . Equally-Spaced Binning Quantile Binning (discrete pit ) Construct buckets to be equal in width: Construct buckets to be equal in mass: • Only optimal for uniform data. • Adapts bins to fit the data distr. A A WWW A A W I AAA A A A A A A A A A A The Effectiveness of Discretization in Forecasting 5 Europe
Our Binning Strategies: Local Absolute & Global Relative Binning Local Absolute Binning ( lab ) Global Relative Binning ( grb ) Hybrid Binning ( hyb ) concat emb emb emb bin bin bin ms ms ms The Effectiveness of Discretization in Forecasting 6 Europe
Models & Output Distributions Models Output Distributions We consider three different models which We compare three different approaches for we combine with the aforementioned I/O modeling the output distribution p ( z t | h t ): transformations: • Student-t distribution ( st ); • Simple Feed Forward: SFF • Piecewise-linear spline quantile • Autoregressive CNN: WaveNet [2] function approach of [1] ( plqs ); • Autoregressive RNN: DeepAR [4] • Categorical distribution ( cat ); z i , T i +1: T i + τ ψ Likelihood Distribution Distribution Model Model z i , 1: T i φ x i , 1: T i + τ The Effectiveness of Discretization in Forecasting 7 Europe
Experimental Results • Varying I/O representations with models on m4 , electricity , traffic , wiki . Output Scaling vs Binning Input Scaling vs Binning • Output representation has large perf. • Input representation has a smaller perf. impact. Loss differences (max/min/avg): impact. Loss differences (max/min/avg): • WaveNet: 3.6x / 1.2x / 1.7x • WaveNet: 3.0x / 1.4x / 1.9x • DeepAR: 7.6x / 1.4x / 2.9x • DeepAR: 5.7x / 1.0x / 1.9x • SFF: 1.8x / 1.0x / 1.2x • SFF: 1.8x / 1.0x / 1.2x • WaveNet profits a lot from binning (8/9), • There is no one clear dominant WaveNet with grb performs best (7/9). representation outperforming others. • DeepAR shows degradation in perf. with • Multi-scale hybrid binning often does well binning over ms (avg 2.6x higher loss). (6/9), lab performs badly (9/9). • Mixed results for SFF (no clear winner). • grb and pit mostly on par (avg 1.4x). The Effectiveness of Discretization in Forecasting 8 Europe
Binning Resolution Effects ( m4 hourly ) 0 . 07 GRB GRB 1 . 5 LAB LAB PIT 0 . 06 Mean wQL Mean wQL 1 . 0 0 . 05 0 . 04 0 . 5 0 . 03 0 . 0 10 0 10 1 10 2 10 3 10 4 10 1 10 2 10 3 10 4 Number of input bins Number of output bins Performance effects of varying input Performance effects of varying output binning resolutions w.r.t a fixed binning resolutions w.r.t a fixed 1024-bin q- grb output binning. 1024-bin q- grb input binning. The Effectiveness of Discretization in Forecasting 9 Europe
Summary Picking a good I/O representation is equally important as selecting a good model! Extended Paper: https://arxiv.org/abs/2005.10111 GluonTS: Probabilistic Time Series Modeling Library (Python): https://github.com/awslabs/gluon-ts The Effectiveness of Discretization in Forecasting 10 Europe
References J. Gasthaus, K. Benidis, Y. Wang, S. S. Rangapuram, D. Salinas, V. Flunkert, and T. Januschowski. Probabilistic Forecasting with Spline Quantile Function RNNs. In The 22nd International Conference on Artificial Intelligence and Statistics , 2019. A. v. d. Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu. Wavenet: A Generative Model for Raw Audio. arXiv preprint arXiv:1609.03499 , 2016. D. Salinas, M. Bohlke-Schneider, L. Callot, R. Medico, and J. Gasthaus. High-dimensional multivariate forecasting with low-rank gaussian copula processes. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alch´ e Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32 , pages 6824–6834. Curran Associates, Inc., 2019. D. Salinas, V. Flunkert, J. Gasthaus, and T. Januschowski. DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks. International Journal of Forecasting , 2019. The Effectiveness of Discretization in Forecasting 11 Europe
Recommend
More recommend