The Effectiveness of Discretization in Forecasting: An Empirical - PowerPoint PPT Presentation

The Effectiveness of Discretization in Forecasting: An Empirical Study on Neural Time Series Models 6 th Workshop on Mining and Learning from Time Series @ KDD 2020 Stephan Rabanser 1* stephan.rabanser@mail.utoronto.ca Tim Januschowski 2 tjnsch@amazon.com Valentin Flunkert 2 flunkert@amazon.com David Salinas 3 david.salinas@naverlabs.com Jan Gasthaus 2 gasthaus@amazon.com 1 University of Toronto 2 Amazon 3 NAVER LABS Vector Institute AWS AI Labs Europe Europe *Work done at AWS AI Labs August 24, 2020

Motivation & Setup • Recent advancements in global forecasting: z i , T i +1: T i + τ ψ Likelihood model architectures and probabilistic outputs. • We investigate effects of (discrete) I/O Distribution representations. Model T context range i prediction range ( τ ) multiple φ prediction sampling sample paths windows z i , 1: T i x i , 1: T i + τ z i • φ : input transformation. x i • ψ : output transformation, influences output distribution. The Effectiveness of Discretization in Forecasting 2 Europe

Scaling Problem: A Motivating Example ( m4 hourly ) Original time series Time series after scaling Time series after q-transform 1000 40000 4 800 30000 3 600 20000 2 400 10000 1 200 0 0 0 0 200 400 600 0 200 400 600 0 200 400 600 1 . 0 1 . 0 1 . 0 0 . 8 0 . 8 0 . 8 0 . 6 0 . 6 0 . 6 0 . 4 0 . 4 0 . 4 0 . 2 0 . 2 0 . 2 0 . 0 0 . 0 0 . 0 0 200000 400000 600000 0 10 20 0 250 500 750 1000 The Effectiveness of Discretization in Forecasting 3 Europe

Continuous Transforms Addressing the scaling problem in global forecasting is of utmost importance! Scaling Probability Integral Transform ( pit ) Apply an affine transformation to each Maps a RV Z through its CDF: time series: • Y = F Z ( Z ) with Y being uniform. • General form: z ′ i , t = ( z i , t − b i ) / a i . • Data preprocessing: make the • Classic mean scaling ( ms ): empirical marginal of each time series � T i • a i = 1 approximately uniform [3]. t =1 | z i , t | T i • b i = 0 i , t = ˆ F i ( z i , t ) with ˆ • z ′ F i being the ECDF • Lots of possible variations ... for time series z i , 1: T i . The Effectiveness of Discretization in Forecasting 4 Europe

Discretizing Transforms • Binning function b : R → { 1 , 2 , . . . , B } mapping a real input to a discrete output. • Each b ∈ { 1 , . . . , B } is tied to a bucket S b = [ l b − 1 , l b ): b ( z ) = b iff z ∈ S b . Equally-Spaced Binning Quantile Binning (discrete pit ) Construct buckets to be equal in width: Construct buckets to be equal in mass: • Only optimal for uniform data. • Adapts bins to fit the data distr. A A WWW A A W I AAA A A A A A A A A A A The Effectiveness of Discretization in Forecasting 5 Europe

Our Binning Strategies: Local Absolute & Global Relative Binning Local Absolute Binning ( lab ) Global Relative Binning ( grb ) Hybrid Binning ( hyb ) concat emb emb emb bin bin bin ms ms ms The Effectiveness of Discretization in Forecasting 6 Europe

Models & Output Distributions Models Output Distributions We consider three different models which We compare three different approaches for we combine with the aforementioned I/O modeling the output distribution p ( z t | h t ): transformations: • Student-t distribution ( st ); • Simple Feed Forward: SFF • Piecewise-linear spline quantile • Autoregressive CNN: WaveNet [2] function approach of [1] ( plqs ); • Autoregressive RNN: DeepAR [4] • Categorical distribution ( cat ); z i , T i +1: T i + τ ψ Likelihood Distribution Distribution Model Model z i , 1: T i φ x i , 1: T i + τ The Effectiveness of Discretization in Forecasting 7 Europe

Experimental Results • Varying I/O representations with models on m4 , electricity , traffic , wiki . Output Scaling vs Binning Input Scaling vs Binning • Output representation has large perf. • Input representation has a smaller perf. impact. Loss differences (max/min/avg): impact. Loss differences (max/min/avg): • WaveNet: 3.6x / 1.2x / 1.7x • WaveNet: 3.0x / 1.4x / 1.9x • DeepAR: 7.6x / 1.4x / 2.9x • DeepAR: 5.7x / 1.0x / 1.9x • SFF: 1.8x / 1.0x / 1.2x • SFF: 1.8x / 1.0x / 1.2x • WaveNet profits a lot from binning (8/9), • There is no one clear dominant WaveNet with grb performs best (7/9). representation outperforming others. • DeepAR shows degradation in perf. with • Multi-scale hybrid binning often does well binning over ms (avg 2.6x higher loss). (6/9), lab performs badly (9/9). • Mixed results for SFF (no clear winner). • grb and pit mostly on par (avg 1.4x). The Effectiveness of Discretization in Forecasting 8 Europe

Binning Resolution Effects ( m4 hourly ) 0 . 07 GRB GRB 1 . 5 LAB LAB PIT 0 . 06 Mean wQL Mean wQL 1 . 0 0 . 05 0 . 04 0 . 5 0 . 03 0 . 0 10 0 10 1 10 2 10 3 10 4 10 1 10 2 10 3 10 4 Number of input bins Number of output bins Performance effects of varying input Performance effects of varying output binning resolutions w.r.t a fixed binning resolutions w.r.t a fixed 1024-bin q- grb output binning. 1024-bin q- grb input binning. The Effectiveness of Discretization in Forecasting 9 Europe

Summary Picking a good I/O representation is equally important as selecting a good model! Extended Paper: https://arxiv.org/abs/2005.10111 GluonTS: Probabilistic Time Series Modeling Library (Python): https://github.com/awslabs/gluon-ts The Effectiveness of Discretization in Forecasting 10 Europe

References J. Gasthaus, K. Benidis, Y. Wang, S. S. Rangapuram, D. Salinas, V. Flunkert, and T. Januschowski. Probabilistic Forecasting with Spline Quantile Function RNNs. In The 22nd International Conference on Artificial Intelligence and Statistics , 2019. A. v. d. Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu. Wavenet: A Generative Model for Raw Audio. arXiv preprint arXiv:1609.03499 , 2016. D. Salinas, M. Bohlke-Schneider, L. Callot, R. Medico, and J. Gasthaus. High-dimensional multivariate forecasting with low-rank gaussian copula processes. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alch´ e Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32 , pages 6824–6834. Curran Associates, Inc., 2019. D. Salinas, V. Flunkert, J. Gasthaus, and T. Januschowski. DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks. International Journal of Forecasting , 2019. The Effectiveness of Discretization in Forecasting 11 Europe

The Effectiveness of Discretization in Forecasting: An Empirical - PowerPoint PPT Presentation

The Effectiveness of Discretization in Forecasting: An Empirical Study on Neural Time Series Models 6 th Workshop on Mining and Learning from Time Series @ KDD 2020 Stephan Rabanser 1* stephan.rabanser@mail.utoronto.ca Tim Januschowski 2

Flood Forecasting Initiative Guy Shalev Flooding impact Flood Forecasting Flood Forecasting

Forecasts and potential futures Rob Hyndman Author, forecast Forecasting Using R Sample

Discretization and Solution of and Solution of Discretization Convection- -Diffusion Problems

Sampling discretization of integral norms. Lecture 2 Vladimir Temlyakov Chemnitz, September,

Sampling discretization of integral norms. Lecture 3 Vladimir Temlyakov Chemnitz; September,

Higher order solution of ODEs arising from DG space semi-discretization of nonstationary

Forecasting 21 January 2013 1 FCAS Agenda Business Goals & Forecasting Approach

Lecture 10 Forecasting and Model Fitting Colin Rundel 02/20/2017 1 Forecasting 2 Forecasting

Welcome to Forecasting Using R Rob Hyndman Author, forecast Forecasting Using R What you will

CSC Effectiveness Review CSC Effectiveness Review Team October 2018 ICANN63 Need for Review of

2018-2019 FORECASTING INTRODUCTION TO COUNSELORS FRESHMAN YEAR REQUIREMENTS FORECASTING

Electricity price forecasting: from prob- abilistic to deep learning approaches TU Delft &

Air quality forecasting in Europe Forecasting emissions Cross-cutting activities with working

Probabilistic Forecasting with DeepAR and AWS SageMaker EuroPython 2020 - Probabilistic

Processing Forecasting Queries Processing Forecasting Queries Songyun Duan, Shivnath Babu Duke

Tool Demonstration: Demand Forecasting PACE D 2.0 RE Team Agenda Demand Forecasting

RE RESTORI RING TRA RAVELERS RS SENSE OF SA SAFETY, CONFIDE DENCE, AND D COMFORT El

Centuria Metropolitan REIT ASX CEO Connect ASX:CMA 10 December 2019 CMA & Centuria

Last time: oriented surfaces and their boundaries Point your head in the direction of the

Psychologists Adherence and Carer Experiences with Best practice in Intellectual Disabilities

AI Dialogue System for Conversational Commerce in FinTech Host: Prof. Cheng-Zen Yang Yuan Ze

Group Project K ans ShanghAI Lectures 2017 A K an ( ) is a story, dialogue,

Mass and Mixing, Global Analysis Carlo Giunti INFN, Torino, Italy Rencontres du Vietnam 2017:

UNIT 4: FUNCTIONS TEXT FUNCTIONS Functions are encapsulated pieces of code - mini programs

The Effectiveness of Discretization in Forecasting: An Empirical - PowerPoint PPT Presentation

The Effectiveness of Discretization in Forecasting: An Empirical Study on Neural Time Series Models 6 th Workshop on Mining and Learning from Time Series @ KDD 2020 Stephan Rabanser 1* stephan.rabanser@mail.utoronto.ca Tim Januschowski 2

Flood Forecasting Initiative Guy Shalev Flooding impact Flood Forecasting Flood Forecasting

Forecasts and potential futures Rob Hyndman Author, forecast Forecasting Using R Sample

Discretization and Solution of and Solution of Discretization Convection- -Diffusion Problems

Sampling discretization of integral norms. Lecture 2 Vladimir Temlyakov Chemnitz, September,

Sampling discretization of integral norms. Lecture 3 Vladimir Temlyakov Chemnitz; September,

Higher order solution of ODEs arising from DG space semi-discretization of nonstationary

Forecasting 21 January 2013 1 FCAS Agenda Business Goals &amp; Forecasting Approach

Lecture 10 Forecasting and Model Fitting Colin Rundel 02/20/2017 1 Forecasting 2 Forecasting

Welcome to Forecasting Using R Rob Hyndman Author, forecast Forecasting Using R What you will

CSC Effectiveness Review CSC Effectiveness Review Team October 2018 ICANN63 Need for Review of

2018-2019 FORECASTING INTRODUCTION TO COUNSELORS FRESHMAN YEAR REQUIREMENTS FORECASTING

Electricity price forecasting: from prob- abilistic to deep learning approaches TU Delft &amp;

Air quality forecasting in Europe Forecasting emissions Cross-cutting activities with working

Probabilistic Forecasting with DeepAR and AWS SageMaker EuroPython 2020 - Probabilistic

Processing Forecasting Queries Processing Forecasting Queries Songyun Duan, Shivnath Babu Duke

Tool Demonstration: Demand Forecasting PACE D 2.0 RE Team Agenda Demand Forecasting

RE RESTORI RING TRA RAVELERS RS SENSE OF SA SAFETY, CONFIDE DENCE, AND D COMFORT El

Centuria Metropolitan REIT ASX CEO Connect ASX:CMA 10 December 2019 CMA &amp; Centuria

Last time: oriented surfaces and their boundaries Point your head in the direction of the

Psychologists Adherence and Carer Experiences with Best practice in Intellectual Disabilities

AI Dialogue System for Conversational Commerce in FinTech Host: Prof. Cheng-Zen Yang Yuan Ze

Group Project K ans ShanghAI Lectures 2017 A K an ( ) is a story, dialogue,

Mass and Mixing, Global Analysis Carlo Giunti INFN, Torino, Italy Rencontres du Vietnam 2017:

UNIT 4: FUNCTIONS TEXT FUNCTIONS Functions are encapsulated pieces of code - mini programs

Forecasting 21 January 2013 1 FCAS Agenda Business Goals & Forecasting Approach

Electricity price forecasting: from prob- abilistic to deep learning approaches TU Delft &

Centuria Metropolitan REIT ASX CEO Connect ASX:CMA 10 December 2019 CMA & Centuria