ROBUST ONLINE AGGREGATION OF FORECASTS APPLICATION TO ELECTRICITY - PowerPoint PPT Presentation

ROBUST ONLINE AGGREGATION OF FORECASTS APPLICATION TO ELECTRICITY LOAD FORECASTING Pierre Gaillard October 21, 2015 University of Copenhagen

T he framework of this talk Sequential prediction of arbitrary time-series based on expert forecasts: • a time-series y 1 , . . . , y n ∈ R d is to be predicted • Expert forecasts are available: e.g., given by some stochastic or machine-learning models (for us: black boxes ) At each forecasting instance t � 1 , . . . , n • forecasting black-box k ∈ { 1 , . . . , K } provides forecast x k , t of y t • we are ask to form a prediction � y t of y t with knowledge of ◦ the past observations y 1 , . . . , y t − 1 ◦ the current and past expert forecasts ( x k , s ) s ≤ t , 1 ≤ k ≤ K • we observe y t 1 / 16

T he framework of this talk Sequential prediction of arbitrary time-series based on expert forecasts: • a time-series y 1 , . . . , y n ∈ R d is to be predicted • Expert forecasts are available: e.g., given by some stochastic or machine-learning models (for us: black boxes ) At each forecasting instance t � 1 , . . . , n • forecasting black-box k ∈ { 1 , . . . , K } provides forecast x k , t of y t • typical solution: assign a weight � p k , t to each expert and predict K � � � y t � p k , t x k , t k � 1 • we observe y t 1 / 16

Ev aluation criterion W e consider a convex loss function ℓ : R d × R d → R , e.g., the square loss ℓ ( x , y ) � � x − y � 2 . Goal: minimize our average loss � n ℓ ( � L n � 1 � y t , y t ) . n t � 1 Difficulty: no stochastic assumption on the time series - neither on the observations ( y t ) - neither on the expert forecasts ( x k , t ) They are arbitrary and can be chosen by an adversary . If all experts are bad, good performance is hopeless ➥ relative criterion 2 / 16

T he regret: a relative criterion W e evaluate our performance relatively to the ones of the experts n � n n � n 1 � 1 � 1 � 1 � ℓ t min ℓ k , t ℓ t − min ℓ k , t � + n n n n k � 1 ,..., K k � 1 ,..., K � �� t � 1 t � 1 t � 1 t � 1 � � def def def � L ⋆ L n � Reg n n our reference performance average regret performance (approximation error) (estimation error) where � ℓ t � ℓ ( � y t , y t ) and ℓ k , t � ℓ ( x k , t , y t ) . Goal Perform almost as good as the best of the experts when n → ∞ � � lim sup sup Reg n ≤ 0 n →∞ ( y t ) , ( x k , t ) 3 / 16

Best convex combination A more ambitious approximation error n K ℓ � q k x k , t , y t � 1 � � min � � n q ∈ ∆ K where ∆ K � � k � 1 q k � 1 � t � 1 k � 1 + : � K q ∈ R K . If an expert provides inaccurate forecasts which compensate other expert forecasts, we should increase its weight. ➥ The gradient trick formalizes this idea ( � Example for the square loss: ( x k , t − y t ) 2 y t − y t )( x k , t − y t ) ➝ Our prediction 4 / 16

B rief summary A meta-statistical interpretation: • expert forecasts are given by some statistical forecasting methods, each possibly tuned with a different given set of parameters. They may rely on some stochastic model. • these ensemble forecasts are then combined in a robust and deterministic manner A trade-off: our final performance expresses these two parts � L n � L ⋆ n + R eg n 5 / 16

Application: electricity load forecasting Goal: a day-ahead forecasting of the French electricity load Data characteristics: • January 1, 2008 – August 31, 2011 as a training data set • September 1, 2011 – June 15, 2012 (excluding some special days) as testing set • Electricity demand for EDF clients, at a half-hour step • Typical values: median = 43 496 MW, maximum = 78 922 MW • Three expert forecasters: GAM, CLR, KWF 6 / 16

D ata looks like... 7 / 16

Application: electricity load forecasting Convex loss functions considered: • squareloss: ℓ ( x , y ) � ( x − y ) 2 ➝ RMSE • absolute percentage of error: ℓ ( x , y ) � | x − y | / y ➝ MAPE Operational constraint: One-day ahead prediction at a half-hour step, i.e., 48 aggregated forecasts Expert forecasters: • GAM / generalized additive models (see Wood 2006; Wood, Goude, Shaw 2014) • CLR / curve linear regression (see Cho, Goude, Brossat, Yao 2013, 2014) • KWF / functional wavelet-kernel approach (see Antoniadis, Paparoditis, Sapatinas 2006; Antoniadis, Brossat, Cugliari, Poggi 2012, 2013) 8 / 16

H ow good are our expert? Loss: RMSE and MAPE on the testing sets (with no warm-up period) � | � � n n ( � y t − y t | 1 � 1 � y t − y t ) 2 n n y t t � 1 t � 1 We look at the performance of the oracles : Uniform Best Best Best mean forecaster convex p linear u RMSE (MW) 725 744 629 629 MAPE (MW) 1.18 1.29 1.06 1.06 9 / 16

A strategy to pick the convex weights The exponentially weighted average forecaster (EWA) Initialization: � P arameter: η > 0 p 1 � ( 1 / K , . . . , 1 / K ) At each time step t , we assign to expert k the weight � � − η � t − 1 exp s � 1 ℓ k , t � p k , t � � � � K − η � t − 1 j � 1 exp s � 1 ℓ j , s Performance: if the loss is convex and bounded by B , n n � + η B 2 log K � 1 � 1 � def R eg n ℓ t − min ℓ k , t ≤ n n η n 8 k t � 1 t � 1 � log K � 8 log K for η � B − 1 B ≤ n 2 n 10 / 16

A strategy to pick the convex weights The exponentially weighted average forecaster (EWA) Initialization: � P arameter: η > 0 p 1 � ( 1 / K , . . . , 1 / K ) At each time step t , we assign to expert k the weight � p k , t − 1 e − ηℓ k , t − 1 � p k , t � j � 1 � � K p j , t − 1 e − ηℓ j , t − 1 Performance: if the loss is convex and bounded by B , n n � + η B 2 log K � 1 � 1 � def R eg n ℓ t − min ℓ k , t ≤ n n η n 8 k t � 1 t � 1 � log K � 8 log K for η � B − 1 B ≤ n 2 n 10 / 16

P roof: let’s do some maths... Lemma (Hoeffding) Let X be a random variable taking value in [ 0 , B ] . Then for any s ∈ R log E � e sX � ≤ s E [ X ] + s 2 B 2 8 1. Upper bound the instantaneous loss � ℓ t � by convexity ℓ t � ℓ ( � � p t · x t , y t ) p t · ℓ ( x t , y t ) ≤ η log � p k , t e − ηℓ k , t � K + η B 2 by Hoeffding � � � − 1 � ≤ � � 8 � � k � 1 by definition of � � + η B 2 p k , t p k , t + 1 − 1 e − ηℓ k , t � η log � 8 p k , t + 1 η log � + η B 2 p k , t + 1 ℓ k , t + 1 � � p k , t 8 2. Sum over all t , the sum telescopes � n � ✟ + η nB 2 + η B 2 p k , n + 1 η log ✟✟ ≤ log K � ℓ t − ℓ k , t ≤ 1 � p k , 1 8 η n 8 t � 1 11 / 16

Calibration of η Best theoretical value � 8 log K η ⋆ � B − 1 n Issue: n and B are not known in advance! Solutions: • “doubling trick” • adaptive learning rate η t picked according to some theoretical value • use simultaneously several learning rates. . . • calibrate on a grid by choosing � � η t ∈ arg min Loss of Exp. weights with η until time t − 1 η 12 / 16

Application to electricity load forecasting (continued) Benchmark and oracles (RMSE) U niform Best Best Best mean forecaster convex p linear u RMSE (MW) 725 744 629 629 vs. Aggregated forecasts with convex weights Exp. weights (best η for theory) 644 Exp. weights (best η on data) 644 Exp. weights (best η tuned on data) 625 ML-Poly (tuned according to theory) 626 13 / 16

Ev olution of the weights N o focus on a single member! Weights change significantly over time and do not converge (illustrate that performance of forecasters varies over time) 14 / 16

Are all forecasters useful? Definitely yes! 3 forecasters ➝ only best 2 Exp. weights 625 ➝ 644 ML-Poly 626 ➝ 646 Forecasters not considered anymore can come back if needed 15 / 16

C onclusion This was only a small glimpse into the work performed during my PhD at EDF R&D. I applied the method to many other data sets with good results ➝ Universality of the method Here, with Olivier we aim at working on • huge number of experts ➝ sparse and efficient methods • better calibration of the learning parameter to get faster rates • lower bounds • probabilistic forecasts by using the pinball loss Thanks 16 / 16

ROBUST ONLINE AGGREGATION OF FORECASTS APPLICATION TO ELECTRICITY - PowerPoint PPT Presentation

ROBUST ONLINE AGGREGATION OF FORECASTS APPLICATION TO ELECTRICITY LOAD FORECASTING Pierre Gaillard October 21, 2015 University of Copenhagen T he framework of this talk Sequential prediction of arbitrary time-series based on expert forecasts:

Robust Aggregation in Sensor Robust Aggregation in Sensor Networks Networks Jie Gao Computer

Outlier Outlier Outlier- Outlier - -robust - robust robust robust identification

FY2012 2 Financial Financial Forecasts Forecasts FY2012 Financial Forecasts FY201 Toyota

Elmwood Park: Electricity Aggregation Developing an Opt-In Municipal Aggregation Program to

simplifying the customer experience through account aggregation Sim Sangha Business Development

The Axiomatic Method in Social Choice Theory: Preference Aggregation, Judgment Aggregation, Graph

Reading the Tea Leaves: Model Uncertainty, Robust Forecasts, and the Autocorrelation of

1 Q- -digest digest Q Example Example Exact data: frequency of data value {f 1 , f 2

Short Course in Supervised Learning Robust Optimization and Machine Learning Robust Supervised

Robust model aggregation for production forecasting of oil and gas Gilles Stoltz CNRS HEC

Making and Evaluating Point Forecasts Tilmann Gneiting Universit at Heidelberg Eltville, June

Data-driven window width adaption adaption for robust for robust online moving window regression

C C Community Choice Aggregation: Community Choice Aggregation: i i Ch i Ch i i i Progress

White Manipulation in Judgment Aggregation Gabriella Pigozzi Davide Grossi ILLC Amsterdam

Municipal Aggregation Update Village of Kenilworth May 16, 2012 & May 21, 2012 Outline

Implementation and Modeling of Robot Aggregation Behavior in Webots Todesco Laetitia & Rappo

An Axiomatic Approach to Algebraic Topology: A Theory of Elementary ( , 1)-Toposes Nima Rasekh

Recent Maps & Works in Progress Martin Gamache, National Geographic Magazine Recent Maps Works

1/30/2014

UMF IN A NUTSHELL UMF CORE FUNCTIONAL BLOCKS Responsible for:

Click to edit Master title style Click to edit Master title style Edit Master text styles Edit

Probabilistic PCA and Factor analysis Course of Machine Learning Master Degree in Computer

1 Some Fun Facts 1.1 Useful Matrix Identities 1. inverse flip identity : ( I n + AB )

Course on Inverse Problems Albert Tarantola Third Lesson: Probability (Elementary Notions) Let u

ROBUST ONLINE AGGREGATION OF FORECASTS APPLICATION TO ELECTRICITY - PowerPoint PPT Presentation

ROBUST ONLINE AGGREGATION OF FORECASTS APPLICATION TO ELECTRICITY LOAD FORECASTING Pierre Gaillard October 21, 2015 University of Copenhagen T he framework of this talk Sequential prediction of arbitrary time-series based on expert forecasts:

Robust Aggregation in Sensor Robust Aggregation in Sensor Networks Networks Jie Gao Computer

Outlier Outlier Outlier- Outlier - -robust - robust robust robust identification

FY2012 2 Financial Financial Forecasts Forecasts FY2012 Financial Forecasts FY201 Toyota

Elmwood Park: Electricity Aggregation Developing an Opt-In Municipal Aggregation Program to

simplifying the customer experience through account aggregation Sim Sangha Business Development

The Axiomatic Method in Social Choice Theory: Preference Aggregation, Judgment Aggregation, Graph

Reading the Tea Leaves: Model Uncertainty, Robust Forecasts, and the Autocorrelation of

1 Q- -digest digest Q Example Example Exact data: frequency of data value {f 1 , f 2

Short Course in Supervised Learning Robust Optimization and Machine Learning Robust Supervised

Robust model aggregation for production forecasting of oil and gas Gilles Stoltz CNRS HEC

Making and Evaluating Point Forecasts Tilmann Gneiting Universit at Heidelberg Eltville, June

Data-driven window width adaption adaption for robust for robust online moving window regression

C C Community Choice Aggregation: Community Choice Aggregation: i i Ch i Ch i i i Progress

White Manipulation in Judgment Aggregation Gabriella Pigozzi Davide Grossi ILLC Amsterdam

Municipal Aggregation Update Village of Kenilworth May 16, 2012 &amp; May 21, 2012 Outline

Implementation and Modeling of Robot Aggregation Behavior in Webots Todesco Laetitia &amp; Rappo

An Axiomatic Approach to Algebraic Topology: A Theory of Elementary ( , 1)-Toposes Nima Rasekh

Recent Maps &amp; Works in Progress Martin Gamache, National Geographic Magazine Recent Maps Works

1/30/2014

UMF IN A NUTSHELL UMF CORE FUNCTIONAL BLOCKS Responsible for:

Click to edit Master title style Click to edit Master title style Edit Master text styles Edit

Probabilistic PCA and Factor analysis Course of Machine Learning Master Degree in Computer

1 Some Fun Facts 1.1 Useful Matrix Identities 1. inverse flip identity : ( I n + AB )

Course on Inverse Problems Albert Tarantola Third Lesson: Probability (Elementary Notions) Let u

Municipal Aggregation Update Village of Kenilworth May 16, 2012 & May 21, 2012 Outline

Implementation and Modeling of Robot Aggregation Behavior in Webots Todesco Laetitia & Rappo

Recent Maps & Works in Progress Martin Gamache, National Geographic Magazine Recent Maps Works