stacking for supervised learning stacking for supervised
play

Stacking for supervised learning Stacking for supervised learning - PowerPoint PPT Presentation

Stacking for supervised learning Stacking for supervised learning Niall Rooney, NIKEL, University of Ulster 1 Ensemble learning Ensemble learning l Postulate multiple hypotheses to explain the data l Shortcomings of single model learning


  1. Stacking for supervised learning Stacking for supervised learning Niall Rooney, NIKEL, University of Ulster 1

  2. Ensemble learning Ensemble learning l Postulate multiple hypotheses to explain the data l Shortcomings of single model learning algorithms (Dietterich , 2002) � Statistical problem � Computational problem � Representational problem 2

  3. Ensemble learning Ensemble learning l Generalization Error: Bias + Variance – Bias: how close the algorithm’s average prediction is close to the target – Variance : how much the algorithm’s predictions “bounces round” for different training sets – a model which is too simple, or too inflexible, will have a large bias – a model which has too much flexibility will have high variance 3

  4. Ensemble learning Ensemble learning l Generalization Error: Ensembles – Ensembles reduce bias and/or variance – Ensembles to be effective – need diverse and accurate base models – Diversity measured by level of variability in base members predictions (for regression) 4

  5. Ensemble learning Ensemble learning § Homogeneous learning - data sampling, feature sampling, randomization, parameter settings § Heterogeneous learning - Same data, different learning algorithms 5

  6. Ensemble Learning Ensemble Learning Class Prediction Combiner Class Predictions . . . Classifier 1 Classifier 2 Classifier N Input Features 6

  7. Ensemble learning Ensemble learning Methods of combination: � Voting, Weighting, Selection � Mixture of experts � Error-correcting output codes � Bagging � Boosting � Stacking 7

  8. Ensemble Learning: Stacking Ensemble Learning: Stacking instance Base Model 1 Base Model 2 … Base Model n Meta Model Prediction 8

  9. Meta Technique: SR Meta Technique: SR Meta-M( f 1 ( x * ),..., f m ( x * ) ) Final Prediction Combining (Meta-Level) CV Meta-training set Meta-M model { } 1 x x ( f ( ),..., f ( ), y ) j m j j Base Predictions f m ( x * ) f 1 ( x * ) f 2 ( x * ) Base ... Model M M M m 1 2 f i instance Instance x * 9

  10. Stacking for classification Stacking for classification § Use class distributions from base classifiers rather than class predictions {( ( P C | ),..., x P C ( | ),..., x P C ( | ),..., x P C ( | ), )} x y 1 1 1 k m 1 m k § Choice of Meta-classifier: Multi-response linear regression - For a classification with m class values, m regression problems - Only use probabilities related to class C j to predict class C j 10

  11. Stacking for classification Stacking for classification § Different “type” of base classifers § Multi-response model trees used to guarantee better performance than Selecting best classifier 11

  12. Stacking for regression Stacking for regression § Linear regression requires non-negative weights § Model trees meta-learner § Homogeneous Stacking using random feature sub-sets § Feature sub-sets can be improved upon using hill-climbing or GA techniques 12

  13. Related techniques:Mutiple techniques:Mutiple meta meta- - Related levels levels Cascade Generalization <x> Classifer 1 <x,P 1 (C 1 ) ,.. P 1 (C k )> Classifer 2 Classifer 3 <x,P 1 (C 1 ),..,P 1 (C k ), P 2 (C1),..,P 2 (Ck) > 13

  14. Related techniques:Mutiple techniques:Mutiple meta meta- - Related levels levels Combiner Trees Combiner 3 Combiner 1 Combiner 2 Classifer 1 Classifer 2 Classifer 3 Classifer 4 T 1 T 2 T 1 T 1 Disjoint training sets 14

  15. Related Techniques: Dynamic Related Techniques: Dynamic Integration Integration Meta-M ( f 1 ( x * ),..., f m ( x * ) ) Final Prediction Combining model Meta- Level Training Set Meta-M (Meta-level) { (x j ,Err 1 ( x j ),..,Err m ( x j ),y j ) } Base errors Err i (x)=|f i (x)-y i | f 1 (x * ) f 2 (x * ) f m (x * ) Base ... Model M M M 1 2 m f i instance x * 15

  16. Dynamic Integration Dynamic Integration Meta Model - distance weighted k-NN Meta-M l NN – set of k nearest meta-instances l For each member find cumulative error of each model 16

  17. Dynamic Integration Dynamic Integration l Dynamic Selection (DS) – choose the model with lowest cumulative error l Dynamic Weighting (DW) – combine the models with weights based on their cumulative error l Dynamic Weighting with Selection (DWS) – combine the models as DW but exclude models if they have larger than median cumulative error 17

  18. Applications Applications l Distributed data mining l Intrusion detection l Concept drift 18

  19. Key papers Key papers Wolpert, D. H.: Stacked Generalization. Neural Networks, 5 l (1992) 241-259 Breiman, L.: Stacked Regressions. Machine Learning, 24 (1996) l 49-64 Dietterich, T. G.: Ensemble Methods in Machine Learning. l Lecture Notes in Computer Science, 1857 (2000) 1-15 Dzeroski, S., & Zenko, B.: Is Combining Classifiers with l Stacking Better than Selecting the Best One? Machine Learning, 54 (2004) 255-273 Ting, K. M., & Witten, I. H.: Issues in Stacked Generalization. l Journal of Artificial Intelligence Research, 10 (1999) 271-289 19

Recommend


More recommend