lecture 20
play

Lecture 20: AdaBoost Aykut Erdem December 2017 Hacettepe University - PowerPoint PPT Presentation

Lecture 20: AdaBoost Aykut Erdem December 2017 Hacettepe University Last time Bias/Variance Tradeo ff slide by David Sontag Graphical illustration of bias and variance. http://scott.fortmann-roe.com/docs/BiasVariance.html 2 Last time


  1. Lecture 20: − AdaBoost Aykut Erdem December 2017 Hacettepe University

  2. Last time… Bias/Variance Tradeo ff slide by David Sontag Graphical illustration of bias and variance. http://scott.fortmann-roe.com/docs/BiasVariance.html 2

  3. Last time… Bagging • Leo Breiman (1994) • Take repeated bootstrap samples from training set D. • Bootstrap sampling: Given set D containing N training examples, create D ’ by drawing N examples at random with replacement from D. • Bagging: - Create k bootstrap samples D 1 ... D k . - Train distinct classifier on each D i . - Classify new instance by majority vote / average. slide by David Sontag 3

  4. Last time… Random Forests Tree t=1 t=2 t=3 slide by Nando de Freitas [From the book of Hastie, Friedman and Tibshirani] 4

  5. Last time… Boosting • Idea: given a weak learner, run it multiple times on (reweighted) training data, then let the learned classifiers vote • On each iteration t : - weight each training example by how incorrectly it was classified - Learn a hypothesis – h t - A strength for this hypothesis – a t • Final classifier: - A linear combination of the votes of the di ff erent classifiers weighted by their strength slide by Aarti Singh & Barnabas Poczos • Practically useful • Theoretically interesting 5

  6. The AdaBoost Algorithm 6

  7. Voted combination of classifiers • The general problem here is to try to combine many simple “weak” classifiers into a single “strong” classifier • We consider voted combinations of simple binary ±1 component classifiers where the (non-negative) votes α i can be used to 
 emphasize component classifiers that are more 
 reliable than others slide by Tommi S. Jaakkola 7

  8. Components: Decision stumps • Consider the following simple family of component classifiers generating ±1 labels: where These are called decision 
 stumps. • Each decision stump pays attention to only a single component of the input vector slide by Tommi S. Jaakkola 8

  9. Voted combinations (cont’d.) • We need to define a loss function for the combination so we can determine which new component h (x; θ ) to add and how many votes it should receive 
 • While there are many options for the loss function we consider here only a simple exponential loss slide by Tommi S. Jaakkola 9

  10. Modularity, errors, and loss • Consider adding the m th component: slide by Tommi S. Jaakkola 10

  11. Modularity, errors, and loss • Consider adding the m th component: slide by Tommi S. Jaakkola 11

  12. Modularity, errors, and loss • Consider adding the m th component: 
 • So at the m th iteration the new component (and the votes) slide by Tommi S. Jaakkola should optimize a weighted loss (weighted towards mistakes). 12

  13. Empirical exponential loss (cont’d.) • To increase modularity we’d like to further decouple the optimization of h (x; θ m ) from the associated votes α m • To this end we select h (x; θ m ) that optimizes the rate at which the loss would decrease as a function of α m slide by Tommi S. Jaakkola 13

  14. 
 Empirical exponential loss (cont’d.) • We find that minimizes • We can also normalize the weights: 
 slide by Tommi S. Jaakkola so that 14

  15. Empirical exponential loss (cont’d.) • We find that minimizes 
 where • is subsequently chosen to minimize slide by Tommi S. Jaakkola 15

  16. 16 The AdaBoost Algorithm slide by Jiri Matas and Jan Š ochman

  17. The AdaBoost Algorithm Given: ( x 1 , y 1 ) , . . . , ( x m , y m ); x i ∈ X , y i ∈ { − 1 , +1 } slide by Jiri Matas and Jan Š ochman 17

  18. The AdaBoost Algorithm Given: ( x 1 , y 1 ) , . . . , ( x m , y m ); x i ∈ X , y i ∈ { − 1 , +1 } Initialise weights D 1 ( i ) = 1 /m slide by Jiri Matas and Jan Š ochman 18

  19. The AdaBoost Algorithm Given: ( x 1 , y 1 ) , . . . , ( x m , y m ); x i 2 X , y i 2 { � 1 , +1 } Initialise weights D 1 ( i ) = 1 /m t = 1 For t = 1 , ..., T : m ⌅ Find h t = arg min h j ∈ H ✏ j = P D t ( i ) J y i 6 = h j ( x i ) K i =1 ⌅ If ✏ t � 1 / 2 then stop slide by Jiri Matas and Jan Š ochman 19

  20. The AdaBoost Algorithm Given: ( x 1 , y 1 ) , . . . , ( x m , y m ); x i 2 X , y i 2 { � 1 , +1 } Initialise weights D 1 ( i ) = 1 /m t = 1 For t = 1 , ..., T : m ⌅ Find h t = arg min h j ∈ H ✏ j = P D t ( i ) J y i 6 = h j ( x i ) K i =1 ⌅ If ✏ t � 1 / 2 then stop 2 log( 1 − ✏ t Set ↵ t = 1 ⌅ ✏ t ) slide by Jiri Matas and Jan Š ochman 20

  21. The AdaBoost Algorithm Given: ( x 1 , y 1 ) , . . . , ( x m , y m ); x i 2 X , y i 2 { � 1 , +1 } Initialise weights D 1 ( i ) = 1 /m t = 1 For t = 1 , ..., T : m ⌅ Find h t = arg min h j ∈ H ✏ j = P D t ( i ) J y i 6 = h j ( x i ) K i =1 ⌅ If ✏ t � 1 / 2 then stop 2 log( 1 − ✏ t Set ↵ t = 1 ⌅ ✏ t ) ⌅ Update D t +1 ( i ) = D t ( i ) exp ( � ↵ t y i h t ( x i )) Z t where Z t is normalisation factor slide by Jiri Matas and Jan Š ochman 21

  22. The AdaBoost Algorithm Given: ( x 1 , y 1 ) , . . . , ( x m , y m ); x i 2 X , y i 2 { � 1 , +1 } Initialise weights D 1 ( i ) = 1 /m t = 1 For t = 1 , ..., T : m P ⌅ Find h t = arg min h j ∈ H ✏ j = D t ( i ) J y i 6 = h j ( x i ) K i =1 ⌅ If ✏ t � 1 / 2 then stop 2 log( 1 − ✏ t Set ↵ t = 1 ⌅ ✏ t ) ⌅ Update D t +1 ( i ) = D t ( i ) exp ( � ↵ t y i h t ( x i )) 0.35 Z t 0.3 where Z t is normalisation factor training error 0.25 Output the final classifier: 0.2 slide by Jiri Matas and Jan Š ochman 0.15 T ! 0.1 X H ( x ) = sign ↵ t h t ( x ) 0.05 t =1 0 0 5 10 15 20 25 30 35 40 step 22

  23. The AdaBoost Algorithm Given: ( x 1 , y 1 ) , . . . , ( x m , y m ); x i 2 X , y i 2 { � 1 , +1 } Initialise weights D 1 ( i ) = 1 /m t = 2 For t = 1 , ..., T : m P ⌅ Find h t = arg min h j ∈ H ✏ j = D t ( i ) J y i 6 = h j ( x i ) K i =1 ⌅ If ✏ t � 1 / 2 then stop 2 log( 1 − ✏ t Set ↵ t = 1 ⌅ ✏ t ) ⌅ Update D t +1 ( i ) = D t ( i ) exp ( � ↵ t y i h t ( x i )) 0.35 Z t 0.3 where Z t is normalisation factor training error 0.25 Output the final classifier: 0.2 slide by Jiri Matas and Jan Š ochman 0.15 T ! 0.1 X H ( x ) = sign ↵ t h t ( x ) 0.05 t =1 0 0 5 10 15 20 25 30 35 40 step 23

  24. The AdaBoost Algorithm Given: ( x 1 , y 1 ) , . . . , ( x m , y m ); x i 2 X , y i 2 { � 1 , +1 } Initialise weights D 1 ( i ) = 1 /m t = 3 For t = 1 , ..., T : m ⌅ P Find h t = arg min h j ∈ H ✏ j = D t ( i ) J y i 6 = h j ( x i ) K i =1 ⌅ If ✏ t � 1 / 2 then stop 2 log( 1 − ✏ t Set ↵ t = 1 ⌅ ✏ t ) ⌅ Update D t +1 ( i ) = D t ( i ) exp ( � ↵ t y i h t ( x i )) 0.35 Z t 0.3 where Z t is normalisation factor training error 0.25 Output the final classifier: 0.2 slide by Jiri Matas and Jan Š ochman 0.15 T ! 0.1 X H ( x ) = sign ↵ t h t ( x ) 0.05 t =1 0 0 5 10 15 20 25 30 35 40 step 24

  25. The AdaBoost Algorithm Given: ( x 1 , y 1 ) , . . . , ( x m , y m ); x i 2 X , y i 2 { � 1 , +1 } Initialise weights D 1 ( i ) = 1 /m t = 4 For t = 1 , ..., T : m ⌅ P Find h t = arg min h j ∈ H ✏ j = D t ( i ) J y i 6 = h j ( x i ) K i =1 ⌅ If ✏ t � 1 / 2 then stop 2 log( 1 − ✏ t Set ↵ t = 1 ⌅ ✏ t ) ⌅ Update D t +1 ( i ) = D t ( i ) exp ( � ↵ t y i h t ( x i )) 0.35 Z t 0.3 where Z t is normalisation factor training error 0.25 Output the final classifier: 0.2 slide by Jiri Matas and Jan Š ochman 0.15 T ! 0.1 X H ( x ) = sign ↵ t h t ( x ) 0.05 t =1 0 0 5 10 15 20 25 30 35 40 step 25

  26. The AdaBoost Algorithm Given: ( x 1 , y 1 ) , . . . , ( x m , y m ); x i 2 X , y i 2 { � 1 , +1 } Initialise weights D 1 ( i ) = 1 /m t = 5 For t = 1 , ..., T : m P ⌅ Find h t = arg min h j ∈ H ✏ j = D t ( i ) J y i 6 = h j ( x i ) K i =1 ⌅ If ✏ t � 1 / 2 then stop 2 log( 1 − ✏ t Set ↵ t = 1 ⌅ ✏ t ) ⌅ Update D t +1 ( i ) = D t ( i ) exp ( � ↵ t y i h t ( x i )) 0.35 Z t 0.3 where Z t is normalisation factor training error 0.25 Output the final classifier: 0.2 slide by Jiri Matas and Jan Š ochman 0.15 T ! 0.1 X H ( x ) = sign ↵ t h t ( x ) 0.05 t =1 0 0 5 10 15 20 25 30 35 40 step 26

  27. The AdaBoost Algorithm Given: ( x 1 , y 1 ) , . . . , ( x m , y m ); x i 2 X , y i 2 { � 1 , +1 } Initialise weights D 1 ( i ) = 1 /m t = 6 For t = 1 , ..., T : m P ⌅ Find h t = arg min h j ∈ H ✏ j = D t ( i ) J y i 6 = h j ( x i ) K i =1 ⌅ If ✏ t � 1 / 2 then stop 2 log( 1 − ✏ t Set ↵ t = 1 ⌅ ✏ t ) ⌅ Update D t +1 ( i ) = D t ( i ) exp ( � ↵ t y i h t ( x i )) 0.35 Z t 0.3 where Z t is normalisation factor training error 0.25 Output the final classifier: 0.2 slide by Jiri Matas and Jan Š ochman 0.15 T ! 0.1 X H ( x ) = sign ↵ t h t ( x ) 0.05 t =1 0 0 5 10 15 20 25 30 35 40 step 27

  28. The AdaBoost Algorithm Given: ( x 1 , y 1 ) , . . . , ( x m , y m ); x i 2 X , y i 2 { � 1 , +1 } Initialise weights D 1 ( i ) = 1 /m t = 7 For t = 1 , ..., T : m P ⌅ Find h t = arg min h j ∈ H ✏ j = D t ( i ) J y i 6 = h j ( x i ) K i =1 ⌅ If ✏ t � 1 / 2 then stop 2 log( 1 − ✏ t Set ↵ t = 1 ⌅ ✏ t ) ⌅ Update D t +1 ( i ) = D t ( i ) exp ( � ↵ t y i h t ( x i )) 0.35 Z t 0.3 where Z t is normalisation factor training error 0.25 Output the final classifier: 0.2 slide by Jiri Matas and Jan Š ochman 0.15 T ! 0.1 X H ( x ) = sign ↵ t h t ( x ) 0.05 t =1 0 0 5 10 15 20 25 30 35 40 step 28

Recommend


More recommend