Forecasting Daily Solar Energy Production Using Robust Regression Techniques Gilles Louppe (@glouppe) Peter Prettenhofer (@pprett) Universit´ e de Li` ege, Belgium Graz University of Technology, Austria
Problem statement Goal Short-term forecasting of daily solar energy production based on weather forecasts from numerical weather prediction (NWP) models. Challenges Solar energy production in April/May 1998 3.5 1e7 ◮ High volatility 3.0 2.5 rapidly changing weather conditions 2.0 ◮ Noisy response 1.5 hardware failure 1.0 ◮ Noisy inputs 0.5 inaccuracy of NWP model 0.0 Apr 04 1998 Apr 18 1998 May 02 1998 May 16 1998 May 30 1998
Data Solar energy production ◮ 98 Oklahoma Mesonet sites ◮ Total incoming solar energy in Jm − 2 ◮ Time period : 1994 - 2007 Courtesy : Dr. Amy McGovern Numerical weather prediction ◮ NOAA/NCEP GEFS Reforecast, 5 forecasts per day ◮ Ensemble comprises 11 members (one control) ◮ 15 measurements (temp, humidity, upward radiative flux, ...)
Overview of our approach Learning Feature Interpolation engineering (Gradient Tree (Kriging) Boosting) 1. Interpolation of meteorological measurements from GEFS grid points onto Mesonet sites ; 2. Construction of new variables from the measurement estimates ; 3. Forecasting of daily energy production using Gradient Boosted Regression Trees, on the basis of the local measurement estimates.
Kriging Goal : Estimate meteorological variables (temperature, humidity, ...) locally at all Mesonet sites. For each day d , period h and type f of meteorological measurement : 1. Build a local learning set L dhf = { ( x i = (lat i , lon i , elevation i ) , y i = m idhf ) } , where m idhf is the average value (over the ensemble) of measurements m idhf of type f , at GEFS location i , day d and period h ; 2. Learn a Gaussian Process from L dhf , for predicting measurements from coordinates ; (Fitting is perfomed using nuggets to account for noise in the measurements.) 3. Predict measurement estimates � m jdhf at Mesonet stations j from their coordinates.
Feature engineering Goal : Build a learning set L from the measurement estimates. 1. Concatenate the estimates at all periods h and for all types f , for each Mesonet station j and day d : L = { ( x jd = ( � m jdh 1 f 1 , � m jdh 1 f 2 , ... ) , y jd = p jd ) } where p jd is the energy production at Mesonet station j and day d . 2. Extend inputs x jd with engineered features : ◮ Solar features (delta between sunrise and sunset) ◮ Temporal features (day of year, month) ◮ Spatial features (latitude, longitude, elevation) ◮ Non-linear combinations of measurement estimates ◮ Daily mean estimates ◮ Variance of the measurement estimates, as produced by the Gaussian Processes
Predicting energy production Goal : Predict daily energy production at Mesonet sites. 1. Learn a model using Gradient Boosted Regression Trees ( sklearn.ensemble.GradientBoostingRegressor ), predicting output y from inputs x ; ◮ Use the Least Absolute Deviation loss for robustness ; ◮ Optimize hyper-parameters on an internal validation set ; 2. For further robustness, repeat Step 1 several times (using different random seeds) and aggregate the predictions of all models.
Results Evaluation ◮ Held-out data from 2008 - 2012. ◮ Mean Absolute Error (MAE) as metric : J D MAE = 1 � � | p jd − ˆ p jd | JD j =1 d =1 Results Method Heldout-Score [MAE] ∆ [%] GMM 4019469.94 46.19% Spline Interp. 2611293.30 17.17% Kriging + GBRT 2162799.74 - Best 2107588.17 -2.62%
0.0 0.5 1.0 1.5 2.0 2.5 3.0 1e8 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 1e8 0.0 0.2 0.4 0.6 0.8 1.0 1.21e7 2001 Error analysis 1 2 3 4 5 1 6 7 8 9 10 11 12 13 14 2 15 16 17 18 19 20 21 22 3 23 24 25 26 27 29 28 30 31 32 4 33 34 35 36 37 38 39 40 5 41 42 Monthly MAE 43 44 45 47 46 48 49 6 50 51 2002 month 52 53 54 55 56 57 7 58 59 60 61 62 63 65 64 66 67 8 68 69 70 71 72 73 74 75 9 76 77 78 79 80 81 82 83 84 10 85 86 87 88 89 90 91 92 93 11 94 95 96 97 98 99 100 101 102 12 ae 103 104 2003 105 106 107 108 109 110 111 112 113 114 115 1000000 1500000 2000000 2500000 116 117 500000 118 119 120 121 122 123 124 125 126 127 128 0 129 9 130 36 131 46 132 72 133 22 134 6 135 30 137 136 35 138 63 139 75 140 141 96 142 3 143 12 144 4 145 78 146 39 147 48 148 18 149 37 150 66 151 34 152 60 153 77 154 155 55 156 2004 64 157 97 158 11 159 8 160 81 161 58 162 69 163 51 164 67 165 13 166 88 167 71 168 92 169 Day-of-year MAE 42 170 171 79 172 173 20 Station MAE 174 76 175 26 176 Daily MAE 90 177 65 178 56 179 21 180 16 181 5 182 stid 41 doy 183 86 184 29 185 27 186 28 187 44 188 0 189 17 191 190 53 192 40 193 61 194 73 195 7 196 10 197 38 198 23 199 14 200 52 201 68 202 203 32 204 84 205 50 206 54 207 2 209 208 2005 83 210 1 211 33 212 91 213 57 214 62 215 15 216 49 217 82 218 87 219 70 220 89 221 47 222 80 223 19 224 31 225 43 226 227 74 228 45 229 95 230 59 231 93 232 25 233 234 85 235 94 236 24 237 238 239 240 241 242 243 245 244 246 247 248 249 250 251 252 33.5 34.0 34.5 35.0 35.5 36.0 36.5 37.0 37.5 253 254 255 256 257 258 104 259 260 2006 261 262 263 264 265 266 267 268 269 270 271 Station MAE (spatial correlation) 272 273 274 275 276 102 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 100 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 98 312 2007 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 96 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 94 347 348 349 350 351 352 353 354 355 0.800 0.825 0.850 0.875 0.900 0.925 0.950 0.975 1.000 356 357 358 359 360 361 ae 362 363 364 365 366
Conclusions ✓ Competitive results (4th position) ; ✓ Robust approach at all steps of the pipeline ; ✗ Including additional data from nearest GEFS grid points might have further improved our results. Questions ? g.louppe|peter.prettenhofer@gmail.com
Kriging illustration
Recommend
More recommend