Multi-Target Regression via Random Linear Target Combinations Grigorios Tsoumakas, Eleftherios Spyromitros-Xioufis Aikaterini Vrekou, Ioannis Vlahavas Department of Informatics Aristotle University of Thessaloniki Thessaloniki 54124, Greece
Multi-Target Regression also known as multivariate or multi-output regression 𝑌 1 𝑌 2 … 𝑌 𝒒 𝑍 1 𝑍 2 … 𝑍 𝒓 0.12 1 … 12 0.14 10 … -1.3 training 2.34 9 … -5 4.15 12 … -2.0 examples 1.22 3 … 40 1.01 28 … -5.3 2.18 2 … 8 ? ? … ? unknown instances 1.76 7 … 23 ? ? … ? 𝑟 continuous output variables 𝑞 input variables
Applications • Ecological modeling • Predicting physical and chemical properties of soil (forestry, agriculture) and water • Economics • Sales and price forecasting for multiple products • Energy • Solar/wind energy production forecasting • Load forecasting • We expect a raise in popularity • Internet of Things, Smart Cities Images are logos of corresponding multi-target regression competitions hosted at
Inspiration multi-label transfer of ideas 1 multi-target regression classification RA 𝑙 EL 2 ? random subset of labels random subset of targets all combinations ? of binary label values 1 E. Spyromitros-Xioufis, G. Tsoumakas, W. Groves, I. Vlahavas, Multi-Label Classification Methods for Multi-Target Regression, arXiv:1211.6581 [cs.LG] 2 G. Tsoumakas, I. Vlahavas, Random k-Labelsets: An Ensemble Method for Multilabel Classification, Proc. ECML PKDD 2007, pp. 406-417, Warsaw, Poland, 2007
Inspiration multi-label transfer of ideas 1 multi-target regression classification RA 𝑙 EL 2 RLC random subset of labels random subset of targets all combinations a random linear of binary label values combination of targets 1 E. Spyromitros-Xioufis, G. Tsoumakas, W. Groves, I. Vlahavas, Multi-Label Classification Methods for Multi-Target Regression, arXiv:1211.6581 [cs.LG] 2 G. Tsoumakas, I. Vlahavas, Random k-Labelsets: An Ensemble Method for Multilabel Classification, Proc. ECML PKDD 2007, pp. 406-417, Warsaw, Poland, 2007
Inspiration multi-label transfer of ideas 1 multi-target regression classification RA 𝑙 EL 2 RLC random subset of labels random subset of targets all combinations a random linear of binary label values combination of targets 1 E. Spyromitros-Xioufis, G. Tsoumakas, W. Groves, I. Vlahavas, Multi-Label Classification Methods for Multi-Target Regression, arXiv:1211.6581 [cs.LG] 2 G. Tsoumakas, I. Vlahavas, Random k-Labelsets: An Ensemble Method for Multilabel Classification, Proc. ECML PKDD 2007, pp. 406-417, Warsaw, Poland, 2007
Sketching RLC 𝑟 targets 𝑠 ≫ 𝑟 targets random linear combinations 𝒛 𝟐 𝒛 𝟑 𝒛 𝟒 𝒜 𝟐 𝒜 𝟑 𝒜 𝟒 𝒜 𝟓 𝒜 𝟔 𝒜 𝟕 of the original targets 1 -0,5 -0,2 0,4 1 2 0,5 -0,3 -1 2 3 0,9 0,6 -0,5 3 4 -0,8 -0,5 -0,9 4 5 -0,5 0,6 0,7 5 6 0,4 0,1 0,8 6 7 -0,2 -0,3 0,8 7 8 -0,4 -0,4 -0,9 8
Sketching RLC 𝑟 targets 𝑠 ≫ 𝑟 targets random linear combinations 𝒛 𝟐 𝒛 𝟑 𝒛 𝟒 𝒜 𝟐 𝒜 𝟑 𝒜 𝟒 𝒜 𝟓 𝒜 𝟔 𝒜 𝟕 of the original targets 1 -0,5 -0,2 0,4 1 0,26 -0,23 0,22 0,66 0 0,5 2 0,5 -0,3 -1 2 -0,73 0,05 -0,42 -1,2 0,32 -0,25 𝑎 = 𝑍𝐷 3 0,9 0,6 -0,5 3 -0,29 0,48 -0,3 -0,99 -0,14 -1,02 4 -0,8 -0,5 -0,9 4 -0,68 -0,83 0,28 -0,33 0,38 0,89 0 0,7 -0,6 -0,6 0 -0,8 5 -0,5 0,6 0,7 5 0,55 -0,14 0,54 0,93 -0,38 0,1 0,1 0 0,4 0 -0,4 -0,5 6 0,4 0,1 0,8 6 0,57 0,52 -0,2 0,48 -0,2 -0,37 0,7 0,3 0 0,9 -0,2 0 7 -0,2 -0,3 0,8 7 0,53 0,1 0 0,84 -0,04 0,31 𝑟 × 𝑠 coefficient matrix 𝐷 8 -0,4 -0,4 -0,9 8 -0,67 -0,55 0,08 -0,57 0,34 0,52 of standard uniform values multi-target solving a system of 𝑠 linear regression model equations with 𝑟 unknowns -0,2 1 0,2 -0,5 -0,4 0,5 ? ? ?
Some More Details 𝑟 targets 𝒛 𝟐 𝒛 𝟑 𝒛 𝟒 Assumption: original 1 -0,5 -0,2 0,4 targets take values from the same domain 2 0,5 -0,3 -1 𝑎 = 𝑍𝐷 3 0,9 0,6 -0,5 4 -0,8 -0,5 -0,9 0 0,7 -0,6 -0,6 0 -0,8 Parameter 𝑙 ∈ 2. . 𝑟 5 -0,5 0,6 0,7 0,1 0 0,4 0 -0,4 -0,5 (number of targets being combined) 6 0,4 0,1 0,8 0,7 0,3 0 0,9 -0,2 0 7 -0,2 -0,3 0,8 𝑟 × 𝑠 coefficient matrix 𝐷 8 -0,4 -0,4 -0,9 of standard uniform values Each original target is involved in 𝑠𝑙/𝑟 new targets
Relation to Output Coding motivation 𝑠 > 𝑟 𝑠 ≤ 𝑟 improve accuracy RLC [2,3] improve computational complexity [1] [4] 1 Hsu, D., Kakade, S., Langford, J., Zhang, T. Multi-label prediction via compressed sensing. In: NIPS 2009, 772 – 780 2 Zhang, Y., Schneider, J.G. Multi-label output codes using canonical correlation analysis. In: AISTATS 2011. 3 Zhang, Y., Schneider, J.G.: Maximum margin output coding. In: ICML 2012, icml.cc / Omnipress 4 Tai, F., Lin, H.T, Multilabel classification with principal label space transformation, Neural Computation 24(9) 2012, 2508 – 2542
Experimental Setup: Methods • ST • One regression model per target using gradient boosting • MORF • Multi-objective random forest of 100 trees • RLC • Multi-target regression algorithm: ST • Solving system of linear equations: least squares All code and specific experimental setup available at MULAN
Experimental Setup: Datasets Name Abbreviation Examples Features Targets 1,2 Airline Ticket Price 1 / 2 atp1d / atp7d 337 / 296 411 6 3 Electrical Discharge Machining edm 154 16 2 4,5 Occupational Employment Survey 1 / 2 oes1997 / oes2010 334 / 403 263 / 298 16 6,7 River Flow 1 / 2 rf1 / rf2 9125 64 / 576 8 8,9 Solar Flare 1 / 2 sf1969 / sf1978 323 / 1066 26 / 27 3 10,11 Supply Chain Management 1 / 2 scm1d / scm20d 9803 / 8966 280 / 61 16 12 Water Quality wq 1060 16 14
Studying the 𝑠 Parameter average of aRRMSE of our method (y-axis) with respect to 𝑠 (x-axis) across all datasets and all 𝑙 values
Studying the 𝑙 Parameter aRRMSE of our method (y-axis) at the atp1d dataset with respect to 𝑠 (x-axis) for 𝑙 ∈ {2, 3, 4, 5, 6}
Comparative Results • Results for 𝑙 =2/3 and 𝑠 =500 models RLC ST MORF ST with gradient boosting appears to be strong baseline RLC - 10:2 8:4 ST 2:10 - 7:5 MORF 4:8 5:7 - RLC is better than ST and MORF RLC ST MORF Wilcoxon signed-rank test at 95% Avg. shows statistically significant 1.5 2.25 2.25 Rank difference between RLC and ST
Pairwise Target Correlations Heat-map of the pairwise target correlations for the scm20d dataset atp1d atp7d edm sf1969 sf1978 oes10 oes97 rf1 rf2 scm1d scm20d wq gain (%) 3.6 2.6 4.6 5.0 3.1 7.9 2.5 -1.3 -2.0 1.6 1.4 1.3 median 0.8013 0.6306 0.0051 0.2242 0.1484 0.8479 0.7952 0.4077 0.4077 0.6526 0.5785 0.0751 stdev 0.0788 0.1602 - 1.1247 1.2006 0.0972 0.0785 0.3125 0.3125 0.1316 0.1483 0.0717
Pairwise Target Correlations No strong correlation between the median of pairwise target correlations and the gain of our approach over ST ( 𝑆 = 0.15 ) The higher the variance of the pairwise target correlations the more difficult for our approach to improve over ST ( 𝑆 = −0.68 ) Between dataset variants, higher median leads to higher gains atp1d atp7d edm sf1969 sf1978 oes10 oes97 rf1 rf2 scm1d scm20d wq gain (%) 3.6 2.6 4.6 5.0 3.1 7.9 2.5 -1.3 -2.0 1.6 1.4 1.3 median 0.8013 0.6306 0.0051 0.2242 0.1484 0.8479 0.7952 0.4077 0.4077 0.6526 0.5785 0.0751 stdev 0.0788 0.1602 - 1.1247 1.2006 0.0972 0.0785 0.3125 0.3125 0.1316 0.1483 0.0717
Recap • Our approach • Constructs new targets by taking random linear combinations of existing targets • Solves a linear equation system at prediction time • Relation to multi-label classification methods • RA 𝑙 EL, output coding • Results • RLC is significantly better than a strong baseline • RLC is better than a state-of-the-art approach • Interesting viewpoint of average target correlations
Future Work • Alternative randomization • Gaussian matrices, sparse Rademacher matrices • Theoretical understanding • WHY and WHEN it works • Increase ensemble diversity • Multiple coefficient matrices (e.g. 5 x 100 vs 1 x 500)
Multi-Target Regression via Random Linear Target Combinations Thank you! Grigorios Tsoumakas Department of Informatics Eleftherios Spyromitros-Xioufis Aristotle University of Thessaloniki Aikaterini Vrekou Thessaloniki 54124, Greece Ioannis Vlahavas
Recommend
More recommend