forecasting the electricity consumption by aggregating
play

Forecasting the electricity consumption by aggregating specialized - PowerPoint PPT Presentation

Forecasting the electricity consumption by aggregating specialized experts Pierre Gaillard (EDF R&D, ENS Paris) with Yannig Goude (EDF R&D) Gilles Stoltz (CNRS, ENS Paris, HEC Paris) June 2013 WIPFOR Setting Algorithms


  1. Forecasting the electricity consumption by aggregating specialized experts Pierre Gaillard (EDF R&D, ENS Paris) with Yannig Goude (EDF R&D) Gilles Stoltz (CNRS, ENS Paris, HEC Paris) June 2013 – WIPFOR

  2. Setting Algorithms Specialized experts Goal 40 Goal Consumption (GW) 35 Short-term (one-day-ahead) forecasting of the French electricity consumption 30 25 Mon Tue Wed Thu Fri Sat Sun  Many models developed by EDF R&D: parametric,    semi-parametric, and non-parametric Adaptive methods of models aggregation   Evolution of the electrical scene in France  ⇒ existing models get questionable 2 / 20

  3. Setting Algorithms Specialized experts Setting – Sequential prediction with expert advice Each instance t - Each expert suggests a prediction x i , t of the consumption y t - We assign weight to each expert and we predict � � N � y t = � � p t · x t = � p i , t x i , t i = 1 Our goal is to minimize our cumulative loss T T � � y t − y t ) 2 ( x i , t − y t ) 2 ( � = + min R T i = 1 ,..., N t = 1 t = 1 � �� � � �� � � �� � Our loss Loss of the best expert Estimation error Good set of experts Good aggregating algorithm 3 / 20

  4. Setting Algorithms Specialized experts Setting – Sequential prediction with expert advice Each instance t - Each expert suggests a prediction x i , t of the consumption y t - We assign weight to each expert and we predict � � N � � y t = � p t · x t = p i , t x i , t � i = 1 Our goal is to minimize our cumulative loss T T � � y t − y t ) 2 ( q · x t − y t ) 2 ( � = + min R T q ∈ ∆ N t = 1 t = 1 � �� � � �� � � �� � Loss of the best Our loss Estimation error convex combination Good set of experts Good aggregating As varied as possible algorithm 4 / 20

  5. Setting Algorithms Specialized experts Minimizing both approximation and estimation error T T � � y t − y t ) 2 = ( q · x t − y t ) 2 + ( � min R T q ∈ ∆ N t = 1 t = 1 � �� � � �� � � �� � Our loss Approximation error Estimation error Approximation error ⇒ good heterogeneous set of experts Ex: specializing the experts, bagging, boosting, . . . Estimation error ⇒ efficient algorithm for aggregating specialized experts Ex: Exponentially weighted average, Exponentiated Gradient, Ridge, . . . Prediction Learning and Games, Cesa-Bianchi and Lugosi, 2006 5 / 20

  6. I. Aggregating algorithms Prediction Learning and Games, Cesa-Bianchi and Lugosi, 2006 June 2013 – WIPFOR

  7. Setting Algorithms Specialized experts Exponentially weighted average forecaster (EWA) Each instance t - Each expert suggests a prediction x i , t of the consumption y t - We assign to expert i the weight � s = 1 ( x i , s − y s ) 2 � − η � t − 1 exp � p i , t = � N � − η � t − 1 s = 1 ( x j , s − y s ) 2 � j = 1 exp y t = � N - and we predict � i = 1 � p i , t x i , t Our cumulated loss is upper bounded by T T � � � y t − y t ) 2 ( x i , t − y t ) 2 ( � � + min � T log N i = 1 ,..., d t = 1 t = 1 � �� � � �� � � �� � Our loss Loss of the best expert Estimation error 7 / 20

  8. Setting Algorithms Specialized experts Exponentially weighted average forecaster (EWA) Each instance t - Each expert suggests a prediction x i , t of the consumption y t - We assign to expert i the weight � s = 1 ( x i , s − y s ) 2 � − η � t − 1 exp � p i , t = � N � − η � t − 1 s = 1 ( x j , s − y s ) 2 � j = 1 exp y t = � N - and we predict � i = 1 � p i , t x i , t Our cumulated loss is upper bounded by ✘✘✘✘✘✘✘✘✘✘✘✘✘✘✘✘✘✘✘✘✘✘✘✘ ❳❳❳❳❳❳❳❳❳❳❳❳❳❳❳❳❳❳❳❳❳❳❳❳ ✘ T T � � y t − y t ) 2 ( q · x t − y t ) 2 ( � + ? � min q ∈ ∆ N t = 1 t = 1 � �� � � �� � � �� � Loss of the best Estimation error Our loss convex combination ❳ 8 / 20

  9. Setting Algorithms Specialized experts Motivation of convex combinations Gam EWA KWF EG 1000 Residuals 0 EWA EG 1.0 Gam −1000 KWF 0.8 0.6 Weights 0.4 −2000 0.2 0.0 Oct Nov Dec Jan Mar Avr Oct Nov Dec Jan Mar Avr 9 / 20

  10. Setting Algorithms Specialized experts Exponentiated gradient forecaster (EG) Each instance t - Each expert suggests a prediction x i , t of the consumption y t - We assign to expert i the weight � − η � t − 1 � where ℓ i , s = 2 ( � y s − y s ) x i , s � p i , t ∝ exp s = 1 ℓ i , s y t = � N - and we predict � i = 1 � p i , t x i , t Our cumulated loss is then bounded as follow T T � � � y t − y t ) 2 ( q · x t − y t ) 2 + ( � � min � T log N q ∈ ∆ N t = 1 t = 1 � �� � � �� � � �� � Loss of the best Our loss Estimation error convexe combination Idea of proof � T � T y t − y t ) 2 − ( q ⋆ · x t − y t ) 2 t = 1 ( � 2 ( � p t · x t − y t ) x t � · ( � p t − q ⋆ ) � t = 1 � �� � T = ℓ t · ( � p t − q ⋆ ) t = 1 � T � T � t = 1 � p t · ℓ t − min t = 1 ℓ i , t i 10 / 20

  11. II. A good set of experts June 2013 – WIPFOR

  12. Setting Algorithms Specialized experts Consider as heterogeneous experts as possible Some ideas to get more variety inside the set of experts Consider heterogeneous prediction methods - Gam: semi-parametric method Generalized Additive Models, Wood, 2006 - KWF: functional method based on similarity between days Clustering functional data using Wavelets, Antoniadis and al, 2013 Create new experts from the same method thanks to boosting, bagging Vary the considered covariate: weather, calendar, . . . Specializing the experts: focus on specific situation (cloudy days,. . . ) during the training 12 / 20

  13. Setting Algorithms Specialized experts The dataset The dataset includes 1 696 days from January 1, 2008 to June 15, 2012 The electricity consumption of EDF customers Side information - weather: temperature, nebulosity, wind - temporal: date, EJP - loss of clients We remove uncommon days (public holidays ± 2) i.e., 55 days each year. We split the dataset in two subsets Jan. 2008 – Aug. 2011: training set to build the experts Sept. 2011 – Jun. 2012: testing set 13 / 20

  14. Setting Algorithms Specialized experts Performance of the forecasting methods and of the aggregating algorithms EWA 1.0 Gam KWF 0.8 0.6 Weights Method RMSE (MW) 0.4 0.2 0.0 Gam 847 Oct Nov Dec Jan Mar Avr EG KWF 1287 1.0 EWA 813 0.8 EG 778 0.6 Weights 0.4 0.2 0.0 Oct Nov Dec Jan Mar Avr 14 / 20

  15. Setting Algorithms Specialized experts Specializing the experts to diversify Idea Focus on specific scenarios during the training of the methods Meteorological scenarios High / low temperature High / low variation of the temperature (since the last day, during the day) Other scenarios High / low consumption Winter / summer Such specialized experts suggest prediction only the days corresponding to their scenario 15 / 20

  16. Setting Algorithms Specialized experts Specializing a method in cold days 1.5 Density At day t , we consider 1.0 0.5 T t = average temperature of the day 0.0 We normalize T t on [ 0 , 1 ] and we choose 0.0 0.2 0.4 0.6 0.8 1.0 for each day the weight Activation weight w t = ( 1 − T t ) 2 0.8 0.6 Activation weights We then train our forecasting method us- ing the prior weights w t on the training 0.4 days 0.2 0.0 Feb May Jul Oct 16 / 20

  17. Setting Algorithms Specialized experts Weights given in 2008 for several specializing scenarios Difference of temp. with last day Hot / cold days 0.8 0.6 Activation weights 0.4 0.2 0.0 High / low consumption Variation of temp. during the day 0.8 Activation weights 0.6 0.4 0.2 0.0 Feb May Jul Oct Feb May Jul Oct 17 / 20

  18. Setting Algorithms Specialized experts Aggregating experts that specialize Setting Each day some of the experts are active and output predictions (according to their specialization) while other experts do not When the expert i is non active, we do not have access to its prediction A solution is to assume that non active experts output the same prediction � y t as we do and solve the fixed-point equation � � � � � p i , t � y t = p j , t x j , t + y t j active i non active Can be extended to activation functions of the experts ∈ [ 0 , 1 ] Forecasting the electricity consumption by aggregating specialized experts, Devaine and al., 2013 18 / 20

  19. Setting Algorithms Specialized experts Performance of algorithms with specialized experts Méthode RMSE (MW) Gam 847 KWF 1287 EWA 813 EG 778 Spec + EWA 765 Spec + EG 714 19 / 20

  20. Setting Algorithms Specialized experts Performance of algorithms with specialized experts Hour Month EWA Spec.EWA Gam EG Spec.EG KWF 1500 1500 RMSE (MW) 1000 1000 500 500 Jan Feb Mar Avr May Jun Sep Oct Nov Dec 4 8 12 16 20 20 / 20

Recommend


More recommend