csc 411 lecture 5 ensembles ii
play

CSC 411 Lecture 5: Ensembles II Roger Grosse, Amir-massoud - PowerPoint PPT Presentation

CSC 411 Lecture 5: Ensembles II Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 05-Ensembles II 1 / 22 Boosting Recall that an ensemble is a set of predictors whose individual decisions are


  1. CSC 411 Lecture 5: Ensembles II Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 05-Ensembles II 1 / 22

  2. Boosting Recall that an ensemble is a set of predictors whose individual decisions are combined in some way to classify new examples. (Previous lecture) Bagging : Train classifiers independently on random subsets of the training data. (This lecture) Boosting : Train classifiers sequentially, each time focusing on training data points that were previously misclassified. Let us start with the concept of weak learner/classifier (or base classifiers). UofT CSC 411: 05-Ensembles II 2 / 22

  3. Weak Learner/Classifier (Informal) Weak learner is a learning algorithm that outputs a hypothesis (e.g., a classifier) that performs slightly better than chance, e.g., it predicts the correct label with probability 0 . 6. We are interested in weak learners that are computationally efficient. ◮ Decision trees ◮ Even simpler: Decision Stump: A decision tree with only a single split [Formal definition of weak learnability has quantifies such as “for any distribution over data” and the requirement that its guarantee holds only probabilistically.] UofT CSC 411: 05-Ensembles II 3 / 22

  4. Weak Classifiers These weak classifiers, which are decision stumps, consist of the set of horizontal and vertical half spaces. Vertical half spaces Horizontal half spaces UofT CSC 411: 05-Ensembles II 4 / 22

  5. Weak Classifiers Vertical half spaces Horizontal half spaces A single weak classifier is not capable of making the training error very small. It only perform slightly better than chance, i.e., the error of classifier h according to the given weights w = ( w 1 , . . . , w N ) (with � N i =1 w i = 1 and w i ≥ 0) N � err = w i I { h ( x i ) � = y i } i =1 is at most 1 2 − γ for some γ > 0. Can we combine a set of weak classifiers in order to make a better ensemble of classifiers? Boosting: Train classifiers sequentially, each time focusing on training data points that were previously misclassified. UofT CSC 411: 05-Ensembles II 5 / 22

  6. AdaBoost (Adaptive Boosting) Key steps of AdaBoost: 1. At each iteration we re-weight the training samples by assigning larger weights to samples (i.e., data points) that were classified incorrectly. 2. We train a new weak classifier based on the re-weighted samples. 3. We add this weak classifier to the ensemble of classifiers. This is our new classifier. 4. We repeat the process many times. The weak learner needs to minimize weighted error. AdaBoost reduces bias by making each classifier focus on previous mistakes. UofT CSC 411: 05-Ensembles II 6 / 22

  7. AdaBoost Example Training data [Slide credit: Verma & Thrun] UofT CSC 411: 05-Ensembles II 7 / 22

  8. AdaBoost Example Round 1 � 1 � 10 i =1 w i I { h 1 ( x ( i ) ) � = t ( i ) } 10 , . . . , 1 � = 3 w = ⇒ Train a classifier (using w ) ⇒ err 1 = � N 10 10 i =1 w i ⇒ α 1 = 1 2 log 1 − err 1 = 1 2 log( 1 0 . 3 − 1) ≈ 0 . 42 ⇒ H ( x ) = sign ( α 1 h 1 ( x )) err 1 [Slide credit: Verma & Thrun] UofT CSC 411: 05-Ensembles II 8 / 22

  9. AdaBoost Example Round 2 � 10 i =1 w i I { h 1 ( x ( i ) ) � = t ( i ) } w = updated weights ⇒ Train a classifier (using w ) ⇒ err 2 = = 0 . 21 � N i =1 w i ⇒ α 2 = 1 2 log 1 − err 3 = 1 1 2 log( 0 . 21 − 1) ≈ 0 . 66 ⇒ H ( x ) = sign ( α 1 h 1 ( x ) + α 2 h 2 ( x )) err 3 UofT CSC 411: 05-Ensembles II 9 / 22 [Slide credit: Verma & Thrun]

  10. AdaBoost Example Round 3 � 10 i =1 w i I { h 1 ( x ( i ) ) � = t ( i ) } w = updated weights ⇒ Train a classifier (using w ) ⇒ err 3 = = 0 . 14 � N i =1 w i ⇒ α 3 = 1 2 log 1 − err 2 = 1 1 2 log( 0 . 14 − 1) ≈ 0 . 91 ⇒ H ( x ) = sign ( α 1 h 1 ( x ) + α 2 h 2 ( x ) + α 3 h 3 ( x )) err 2 [Slide credit: Verma & Thrun] UofT CSC 411: 05-Ensembles II 10 / 22

  11. AdaBoost Example Final classifier [Slide credit: Verma & Thrun] UofT CSC 411: 05-Ensembles II 11 / 22

Recommend


More recommend