Multi-Target Regression via Random Linear Target Combinations - - PowerPoint PPT Presentation

β–Ά
multi target regression via
SMART_READER_LITE
LIVE PREVIEW

Multi-Target Regression via Random Linear Target Combinations - - PowerPoint PPT Presentation

Multi-Target Regression via Random Linear Target Combinations Grigorios Tsoumakas, Eleftherios Spyromitros-Xioufis Aikaterini Vrekou, Ioannis Vlahavas Department of Informatics Aristotle University of Thessaloniki Thessaloniki 54124, Greece


slide-1
SLIDE 1

Multi-Target Regression via Random Linear Target Combinations

Grigorios Tsoumakas, Eleftherios Spyromitros-Xioufis Aikaterini Vrekou, Ioannis Vlahavas

Department of Informatics Aristotle University of Thessaloniki Thessaloniki 54124, Greece

slide-2
SLIDE 2

Multi-Target Regression

π‘Œ1 π‘Œ2 … π‘Œπ’’ 𝑍1 𝑍2 … 𝑍𝒓 0.12 1 … 12 0.14 10 …

  • 1.3

2.34 9 …

  • 5

4.15 12 …

  • 2.0

1.22 3 … 40 1.01 28 …

  • 5.3

2.18 2 … 8 ? ? … ? 1.76 7 … 23 ? ? … ? π‘ž input variables π‘Ÿ continuous output variables training examples unknown instances also known as multivariate or multi-output regression

slide-3
SLIDE 3

Applications

  • Ecological modeling
  • Predicting physical and chemical properties
  • f soil (forestry, agriculture) and water
  • Economics
  • Sales and price forecasting for multiple products
  • Energy
  • Solar/wind energy production forecasting
  • Load forecasting
  • We expect a raise in popularity
  • Internet of Things, Smart Cities

Images are logos of corresponding multi-target regression competitions hosted at

slide-4
SLIDE 4

Inspiration

RA𝑙EL2 random subset of labels all combinations

  • f binary label values

? random subset of targets ?

multi-label classification multi-target regression transfer of ideas 1

1 E. Spyromitros-Xioufis, G. Tsoumakas, W. Groves, I. Vlahavas,

Multi-Label Classification Methods for Multi-Target Regression, arXiv:1211.6581 [cs.LG]

2 G. Tsoumakas, I. Vlahavas, Random k-Labelsets: An Ensemble Method for

Multilabel Classification, Proc. ECML PKDD 2007, pp. 406-417, Warsaw, Poland, 2007

slide-5
SLIDE 5

Inspiration

RA𝑙EL2 random subset of labels all combinations

  • f binary label values

RLC random subset of targets a random linear combination of targets

multi-label classification multi-target regression transfer of ideas 1

1 E. Spyromitros-Xioufis, G. Tsoumakas, W. Groves, I. Vlahavas,

Multi-Label Classification Methods for Multi-Target Regression, arXiv:1211.6581 [cs.LG]

2 G. Tsoumakas, I. Vlahavas, Random k-Labelsets: An Ensemble Method for

Multilabel Classification, Proc. ECML PKDD 2007, pp. 406-417, Warsaw, Poland, 2007

slide-6
SLIDE 6

Inspiration

RA𝑙EL2 random subset of labels all combinations

  • f binary label values

RLC random subset of targets a random linear combination of targets

multi-label classification multi-target regression transfer of ideas 1

1 E. Spyromitros-Xioufis, G. Tsoumakas, W. Groves, I. Vlahavas,

Multi-Label Classification Methods for Multi-Target Regression, arXiv:1211.6581 [cs.LG]

2 G. Tsoumakas, I. Vlahavas, Random k-Labelsets: An Ensemble Method for

Multilabel Classification, Proc. ECML PKDD 2007, pp. 406-417, Warsaw, Poland, 2007

slide-7
SLIDE 7

Sketching RLC

π’›πŸ π’›πŸ‘ π’›πŸ’

1

  • 0,5 -0,2

0,4 2 0,5

  • 0,3
  • 1

3 0,9 0,6

  • 0,5

4

  • 0,8 -0,5 -0,9

5

  • 0,5

0,6 0,7 6 0,4 0,1 0,8 7

  • 0,2 -0,3

0,8 8

  • 0,4 -0,4 -0,9

π’œπŸ π’œπŸ‘ π’œπŸ’ π’œπŸ“ π’œπŸ” π’œπŸ•

1 2 3 4 5 6 7 8

π‘Ÿ targets 𝑠 ≫ π‘Ÿ targets random linear combinations

  • f the original targets
slide-8
SLIDE 8

Sketching RLC

π’›πŸ π’›πŸ‘ π’›πŸ’

1

  • 0,5 -0,2

0,4 2 0,5

  • 0,3
  • 1

3 0,9 0,6

  • 0,5

4

  • 0,8 -0,5 -0,9

5

  • 0,5

0,6 0,7 6 0,4 0,1 0,8 7

  • 0,2 -0,3

0,8 8

  • 0,4 -0,4 -0,9

π’œπŸ π’œπŸ‘ π’œπŸ’ π’œπŸ“ π’œπŸ” π’œπŸ•

1 0,26 -0,23 0,22 0,66 0,5 2

  • 0,73 0,05 -0,42 -1,2 0,32 -0,25

3

  • 0,29 0,48 -0,3 -0,99 -0,14 -1,02

4

  • 0,68 -0,83 0,28 -0,33 0,38 0,89

5 0,55 -0,14 0,54 0,93 -0,38 0,1 6 0,57 0,52 -0,2 0,48 -0,2 -0,37 7 0,53 0,1 0,84 -0,04 0,31 8

  • 0,67 -0,55 0,08 -0,57 0,34 0,52

π‘Ÿ targets 𝑠 ≫ π‘Ÿ targets random linear combinations

  • f the original targets

0,7

  • 0,6 -0,6
  • 0,8

0,1 0,4

  • 0,4 -0,5

0,7 0,3 0,9

  • 0,2

π‘Ÿ Γ— 𝑠 coefficient matrix 𝐷

  • f standard uniform values

π‘Ž = 𝑍𝐷 multi-target regression model

  • 0,2

1 0,2

  • 0,5 -0,4

0,5 ? ? ?

solving a system of 𝑠 linear equations with π‘Ÿ unknowns

slide-9
SLIDE 9

Some More Details

π’›πŸ π’›πŸ‘ π’›πŸ’

1

  • 0,5 -0,2

0,4 2 0,5

  • 0,3
  • 1

3 0,9 0,6

  • 0,5

4

  • 0,8 -0,5 -0,9

5

  • 0,5

0,6 0,7 6 0,4 0,1 0,8 7

  • 0,2 -0,3

0,8 8

  • 0,4 -0,4 -0,9

π‘Ÿ targets

0,7

  • 0,6 -0,6
  • 0,8

0,1 0,4

  • 0,4 -0,5

0,7 0,3 0,9

  • 0,2

π‘Ÿ Γ— 𝑠 coefficient matrix 𝐷

  • f standard uniform values

π‘Ž = 𝑍𝐷 Assumption: original targets take values from the same domain Parameter 𝑙 ∈ 2. . π‘Ÿ (number of targets being combined) Each original target is involved in 𝑠𝑙/π‘Ÿ new targets

slide-10
SLIDE 10

Relation to Output Coding

motivation 𝑠 > π‘Ÿ 𝑠 ≀ π‘Ÿ improve accuracy RLC [2,3] improve computational complexity [1] [4]

1 Hsu, D., Kakade, S., Langford, J., Zhang, T.

Multi-label prediction via compressed sensing. In: NIPS 2009, 772–780

2 Zhang, Y., Schneider, J.G.

Multi-label output codes using canonical correlation analysis. In: AISTATS 2011.

3 Zhang, Y., Schneider, J.G.: Maximum margin output coding. In: ICML 2012, icml.cc / Omnipress 4 Tai, F., Lin, H.T, Multilabel classification with principal label space transformation,

Neural Computation 24(9) 2012, 2508–2542

slide-11
SLIDE 11

Experimental Setup: Methods

  • ST
  • One regression model per target using gradient boosting
  • MORF
  • Multi-objective random forest of 100 trees
  • RLC
  • Multi-target regression algorithm: ST
  • Solving system of linear equations: least squares

All code and specific experimental setup available at MULAN

slide-12
SLIDE 12

Experimental Setup: Datasets

Name Abbreviation Examples Features Targets

1,2 Airline Ticket Price 1 / 2 atp1d / atp7d 337 / 296 411 6 3 Electrical Discharge Machining edm 154 16 2 4,5 Occupational Employment Survey 1 / 2 oes1997 / oes2010 334 / 403 263 / 298 16 6,7 River Flow 1 / 2 rf1 / rf2 9125 64 / 576 8 8,9 Solar Flare 1 / 2 sf1969 / sf1978 323 / 1066 26 / 27 3 10,11 Supply Chain Management 1 / 2 scm1d / scm20d 9803 / 8966 280 / 61 16 12 Water Quality wq 1060 16 14

slide-13
SLIDE 13

Studying the 𝑠 Parameter

average of aRRMSE of our method (y-axis) with respect to 𝑠 (x-axis) across all datasets and all 𝑙 values

slide-14
SLIDE 14

Studying the 𝑙 Parameter

aRRMSE of our method (y-axis) at the atp1d dataset with respect to 𝑠 (x-axis) for 𝑙 ∈ {2, 3, 4, 5, 6}

slide-15
SLIDE 15
  • Results for 𝑙=2/3 and 𝑠=500 models

Comparative Results

RLC ST MORF Avg. Rank 1.5 2.25 2.25 Wilcoxon signed-rank test at 95% shows statistically significant difference between RLC and ST RLC ST MORF RLC

  • 10:2

8:4 ST 2:10

  • 7:5

MORF 4:8 5:7

  • ST with gradient boosting

appears to be strong baseline RLC is better than ST and MORF

slide-16
SLIDE 16

Pairwise Target Correlations

atp1d atp7d edm sf1969 sf1978 oes10 oes97 rf1 rf2 scm1d scm20d wq gain (%)

3.6 2.6 4.6 5.0 3.1 7.9 2.5

  • 1.3
  • 2.0

1.6 1.4 1.3

median 0.8013 0.6306 0.0051 0.2242 0.1484 0.8479 0.7952 0.4077 0.4077 0.6526

0.5785 0.0751

stdev

0.0788 0.1602

  • 1.1247 1.2006 0.0972 0.0785 0.3125 0.3125 0.1316

0.1483 0.0717 Heat-map of the pairwise target correlations for the scm20d dataset

slide-17
SLIDE 17

Pairwise Target Correlations

The higher the variance of the pairwise target correlations the more difficult for our approach to improve over ST (𝑆 = βˆ’0.68) No strong correlation between the median

  • f pairwise target correlations and the gain
  • f our approach over ST (𝑆 = 0.15)

Between dataset variants, higher median leads to higher gains atp1d atp7d edm sf1969 sf1978 oes10 oes97 rf1 rf2 scm1d scm20d wq gain (%)

3.6 2.6 4.6 5.0 3.1 7.9 2.5

  • 1.3
  • 2.0

1.6 1.4 1.3

median 0.8013 0.6306 0.0051 0.2242 0.1484 0.8479 0.7952 0.4077 0.4077 0.6526

0.5785 0.0751

stdev

0.0788 0.1602

  • 1.1247 1.2006 0.0972 0.0785 0.3125 0.3125 0.1316

0.1483 0.0717

slide-18
SLIDE 18

Recap

  • Our approach
  • Constructs new targets by taking random linear

combinations of existing targets

  • Solves a linear equation system at prediction time
  • Relation to multi-label classification methods
  • RA𝑙EL, output coding
  • Results
  • RLC is significantly better than a strong baseline
  • RLC is better than a state-of-the-art approach
  • Interesting viewpoint of average target correlations
slide-19
SLIDE 19

Future Work

  • Alternative randomization
  • Gaussian matrices, sparse Rademacher matrices
  • Theoretical understanding
  • WHY and WHEN it works
  • Increase ensemble diversity
  • Multiple coefficient matrices (e.g. 5 x 100 vs 1 x 500)
slide-20
SLIDE 20

Multi-Target Regression via Random Linear Target Combinations

Thank you!

Grigorios Tsoumakas Eleftherios Spyromitros-Xioufis Aikaterini Vrekou Ioannis Vlahavas Department of Informatics Aristotle University of Thessaloniki Thessaloniki 54124, Greece