biased resampling strategies for imbalanced
play

Biased Resampling Strategies for Imbalanced Spatio-temporal - PowerPoint PPT Presentation

Biased Resampling Strategies for Imbalanced Spatio-temporal Forecasting M A R I AN A O L IV EI R A , N U N O M O N I Z , L U S T O R G O A N D V T O R S A N T O S C O S TA 5 - 8 O C T O B


  1. Biased Resampling Strategies for Imbalanced Spatio-temporal Forecasting M A R I AN A O L IV EI R A¹ ⋅ ² , N U N O M O N I Z ¹ ⋅ ² , L U Í S T O R G O ¹ ⋅ ² ⋅ ³ A N D V Í T O R S A N T O S C O S TA ¹ ⋅ ² 5 - 8 O C T O B E R 2 0 1 9 1 2 3

  2. Spatio-temporal Data Remote monitoring equipment (Source: NDSU) PM 2.5 pollution levels (time series) Air quality measurement station network (Source: Zheng et al., 2013) 2/28

  3. Imbalanced Numeric Forecasting IMBALANCED DOMAIN RELEVANCE FUNCTION 3/28

  4. Our Contribution • Random resampling • Will introducing a approaches are often sampling bias used to tackle this that takes into account Research problem spatio-temporal Motivation dependencies improve Questions • However, our data is performance? not i.i.d. -- there are spatial and temporal • Should we weight the dependencies dimensions differently? 4/28

  5. Biased Resampling 5/28

  6. Proposed Resampling Strategies Spatio-temporal Random Under-sampling (STRUS) • Keep all extreme cases • Keep only u % of normal cases, 0 < u < 100 (with sampling bias) Spatio-temporal Random Over-sampling (STROS) • Keep all (normal and extreme) cases • Add o % replicas of extreme cases, o > 0 (with sampling bias) 6/28

  7. Spatio-temporal Sampling Bias Which cases should have higher probability of being selected during resampling? More recent Temp mporal weight observations Isolated rare Spatialweight (extreme cases) cases At each time-step Far away Spatialweight from rare (normal cases) cases 7/28

  8. Spatio-temporal Sampling Bias What if spatial and temporal dimensions havedifferent impacts? Add weighting parameter α 8/28

  9. Experiments 9/28

  10. Datasets Data source ID # time IDs # loc IDs % available % extreme MESA 10 280 20 100 7.3 NCDC 20 105 72 100 6.0 30 6.3 TCE 31 330 26 100 3.8 32 2.4 Rural 40 4k 70 ~49 7.5 50 3.5 Beijing Air 51 11k 36 ~40 5.5 52 8.6 53 3.8 10/28

  11. Learning Process • None • MARS Calculate • RUS spatio- • Random Feature • ROS Resampling Model engineering temporal Forest • STRUS indicators • RPART • STROS PRE-PROCESSING 11/28

  12. Experimental Evaluation Performance estimation Evaluation metrics procedure 12/28

  13. Evaluation Metrics • Utility-based precision and recall for numeric prediction: 13/28

  14. Performance Estimation Procedure • Prequential temporal block evaluation y x train test time 14/28

  15. Parametrization Internal Fixed a Optimal a tuning priori posteriori 15/28

  16. Parametrization Internal Fixed a Optimal a tuning priori posteriori 16/28

  17. Internal Tuning INTERNAL ESTIMATION PROCEDURE PARAMETER GRID SEARCH For each training set: Parameter Values u 0.2; 0.4; 0.6; 0.8; 0.95 o 0.5; 1; 2; 3; 4 Temporal-block CV α 0; 0.25; 0.5; 0.75; 1 y x time 17/28

  18. Parametrization Internal Fixed a Optimal a tuning priori posteriori 18/28

  19. Fixed a priori FIXED PARAMETERS Parameter Values u 0.2; 0.4; 0.6 ; 0.8; 0.95 For all training sets: o 0.5; 1; 2 ; 3; 4 α 0; 0.25; 0.5 ; 0.75; 1 Fixed parameters at middle of the grid. 19/28

  20. Parametrization Internal Fixed a Optimal a tuning priori posteriori 20/28

  21. Optimal a posteriori EXTERNAL ESTIMATION PROCEDURE PARAMETER GRID SEARCH For each data set: Parameter Values u 0.2; 0.4; 0.6; 0.8; 0.95 o 0.5; 1; 2; 3; 4 α 0; 0.25; 0.5; 0.75; 1 Choose parameters with best results on the external (prequential) procedure. 21/28

  22. Results 22/28

  23. Average Rank of F 1 u Parametrization None ROS STROS RUS STRUS Internal tuning 4.60 3.07 2.37 2.67 2.30 Fixed a priori 4.53 2.77 2.73 2.57 2.40 Optimal a posteriori 5.00 3.07 2.27 2.93 1.73 23/28

  24. Parameter Sensitivity Analysis TWO PARAMETERS DIMENSION WEIGHTING RUS/ROS RUS/ROS RUS/ROS 24/28

  25. Precision and Recall Trade-off PRECISION RECALL RUS/ROS RUS/ROS RUS/ROS RUS/ROS RUS/ROS RUS/ROS 25/28

  26. Conclusion 26/28

  27. Conclusion • Including spatio-temporal bias when resampling improves performance • The contributions of each dimension should be weigthed : • When over-sampling : favour temporal weight and prioritize more recent observations • When under-sampling : favour spatial weight and prioritize isolated rare cases and normal cases that are spatially distant from extreme cases • Future work: • Study the impact of data characteristics on performance • Consider local instead of global definitions of extreme values 27/28

  28. Thank you! Code available at https://github.com/mrfoliveira/STResampling-DSAA2019 28/28

Recommend


More recommend