cs570 data mining classification ensemble methods
play

CS570 Data Mining Classification: Ensemble Methods Cengiz Gnay - PowerPoint PPT Presentation

CS570 Data Mining Classification: Ensemble Methods Cengiz Gnay Dept. Math & CS, Emory University Fall 2013 Some slides courtesy of Han-Kamber-Pei, Tan et al. , and Li Xiong Gnay (Emory) Classification: Ensemble Methods Fall 2013 1 /


  1. CS570 Data Mining Classification: Ensemble Methods Cengiz Günay Dept. Math & CS, Emory University Fall 2013 Some slides courtesy of Han-Kamber-Pei, Tan et al. , and Li Xiong Günay (Emory) Classification: Ensemble Methods Fall 2013 1 / 6

  2. Today Due today midnight: Homework #2 – Frequent itemsets Given today: Homework #3 – Classification Today’s menu: Classification: Ensemble Methods Günay (Emory) Classification: Ensemble Methods Fall 2013 2 / 6

  3. Ensemble Methods • Given a data set, generate multiple models and combine the results • Bagging • Random Forests • Boosting – PAC learning significance

  4. General Idea

  5. Why does it work?  Suppose there are 25 base classifiers Each classifier has error rate, ε = 0.35  Assume classifiers are independent  Probability that the ensemble classifier makes a wrong  prediction: 25 ( i ) ε i ( 1 − ε ) 25 − i = 0.06 25 ∑ i= 13

  6. Types of Ensemble Methods Can be obtained by manipulating: 1 Training set: Bagging Boosting Günay (Emory) Classification: Ensemble Methods Fall 2013 3 / 6

  7. Types of Ensemble Methods Can be obtained by manipulating: 1 Training set: Bagging Boosting 2 Input features: Random forests Multi-objective evolutionary algorithms Forward/backward elimination? Günay (Emory) Classification: Ensemble Methods Fall 2013 3 / 6

  8. Types of Ensemble Methods Can be obtained by manipulating: 1 Training set: Bagging Boosting 2 Input features: Random forests Multi-objective evolutionary algorithms Forward/backward elimination? 3 Class labels: Multi-classes Active learning Günay (Emory) Classification: Ensemble Methods Fall 2013 3 / 6

  9. Types of Ensemble Methods Can be obtained by manipulating: 1 Training set: Bagging Boosting 2 Input features: Random forests Multi-objective evolutionary algorithms Forward/backward elimination? 3 Class labels: Multi-classes Active learning 4 Learning algorithm: ANNs Decision trees Günay (Emory) Classification: Ensemble Methods Fall 2013 3 / 6

  10. Bagging • Create a data set by sampling data points with replacement • Create model based on the data set • Generate more data sets and models • Predict by combining votes – Classification: majority vote – Prediction: average

  11. Bagging  Sampling with replacement Original Data 1 2 3 4 5 6 7 8 9 10 Bagging (Round 1) 7 8 10 8 2 5 10 10 5 9 Bagging (Round 2) 1 4 9 1 2 3 2 7 3 2 Bagging (Round 3) 1 8 5 10 5 5 9 6 3 7  Build classifier on each bootstrap sample  Each sample has probability (1 – 1/n) n of being selected

  12. Bagging Advantages: Less overfitting Helps when classifier is unstable (has high variance) Disadvantages: Not useful when classifier is stable and has large bias Günay (Emory) Classification: Ensemble Methods Fall 2013 4 / 6

  13. PAC learning • Model defining learning with given accuracy and confidence using polynomial sample complexity • References: – L. Valiant. A theory of the learnable. • http://web.mit.edu/6.435/www/Valiant84.pdf – D. Haussler. Overview of the Probably Approximately Correct (PAC) Learning Framework • http://www.cs.iastate.edu/~honavar/pac.pdf

  14. Boosting • Use weak learners and combine to form strong learner in PAC learning sense • Learn using a weak learner • Boost the accuracy by reweighting the examples misclassified by previous weak learner and forcing the next weak learner to focus on the “hard” examples • Predict by using a weighted combination of the weak learners – Weight is determined by their accuracy

  15. Boosting  An iterative procedure to adaptively change distribution of training data by focusing more on previously misclassified records Initially, all N records are assigned equal weights  Unlike bagging, weights may change at the end of boosting  round

  16. Boosting  Records that are wrongly classified will have their weights increased  Records that are classified correctly will have their weights decreased Original Data 1 2 3 4 5 6 7 8 9 10 Boosting (Round 1) 7 3 2 8 7 9 4 10 6 3 Boosting (Round 2) 5 4 9 4 2 5 1 7 4 2 Boosting (Round 3) 4 4 8 10 4 5 4 6 3 4 • Example 4 is hard to classify • Its weight is increased, therefore it is more likely to be chosen again in subsequent rounds

  17. Boosting Advantages: Focuses on samples that are hard to classify Sample weights can be used for: Sampling probability 1 Used by classifier to value them more 2 Adaboost: Calculates classifier importance instead of voting Exponential weight update rules But, susceptible to overfitting Günay (Emory) Classification: Ensemble Methods Fall 2013 5 / 6

  18. Example: AdaBoost  Base classifiers: C 1 , C 2 , …, C T  Error rate: N ε i = 1 N ∑ w j δ ( C i ( x j )≠ y j ) j= 1  Importance of a classifier: 2 ln ( ε i ) 1 − ε i α i = 1

  19. Example: AdaBoost  Weight update: if C j ( x i )≠ y i } Z j { ( j ) exp − α j ( j+ 1 ) = w i if C j ( x i ) =y i w i α j exp where Z j is the normalization factor  If any intermediate rounds produce error rate higher than 50%, the weights are reverted back to 1/n and the resampling procedure is repeated  Classification: T ( ) ∑ = α δ = C * ( x ) arg max C ( x ) y j j y = j 1

  20. Illustrating AdaBoost Initial weights for each data point Data points for training (C) Vipin Kumar, Parallel 11 Issues in Data Mining, V

  21. Illustrating AdaBoost (C) Vipin Kumar, Parallel 12 Issues in Data Mining, V

  22. Random Forests • Sample a data set with replacement • Select m variables at random from p variables • Create a tree • Similarly create more trees • Combine the results • Reference: – Hastie, Tibshirani, Friedman, The Elements of Statistical Learning, Chapter 15

  23. Random Forests Advantages: Only for decision trees Lowers generalization error Uses randomization in tree construction: #features = log 2 d + 1 Equivalent accuracy to Adaboost, but faster See table in Tan et al p. 294 for comparison of ensemble methods. Günay (Emory) Classification: Ensemble Methods Fall 2013 6 / 6

Recommend


More recommend