lab 10 demonstration of adaboost
play

Lab #10: Demonstration of AdaBoost CS109A Introduction to Data - PowerPoint PPT Presentation

Lab #10: Demonstration of AdaBoost CS109A Introduction to Data Science Pavlos Protopapas, Kevin Rader, and Chris Tanner 1 Our Data Training Data Chest Blocked Patient Heart Pain Arteries Weight Disease Yes Yes 205 Yes No Yes 180


  1. Lab #10: Demonstration of AdaBoost CS109A Introduction to Data Science Pavlos Protopapas, Kevin Rader, and Chris Tanner 1

  2. Our Data Training Data Chest Blocked Patient Heart Pain Arteries Weight Disease Yes Yes 205 Yes No Yes 180 Yes Yes No 210 Yes Yes Yes 167 Yes No Yes 156 No No Yes 125 No Yes No 168 No Yes Yes 172 No CS109A, P ROTOPAPAS , R ADER , T ANNER 2

  3. Bagging Training Data Chest Blocked Patient Heart Pain Arteries Weight Disease 1. Shuffle (i.e., bootstrap the data) Yes Yes 205 Yes 2. Train a new decision tree T i No Yes 180 Yes Yes No 210 Yes Yes Yes 167 Yes No Yes 156 No No Yes 125 No Yes No 168 No Yes Yes 172 No CS109A, P ROTOPAPAS , R ADER , T ANNER 3

  4. Bagging Training Data Do N times Chest Blocked Patient Heart Pain Arteries Weight Disease 1. Shuffle (i.e., bootstrap the data) Yes Yes 205 Yes 2. Train a new decision tree T i No Yes 180 Yes We have {T 1 , T 2 , T 3 , …, T N } Yes No 210 Yes Yes Yes 167 Yes No Yes 156 No No Yes 125 No Yes No 168 No Yes Yes 172 No CS109A, P ROTOPAPAS , R ADER , T ANNER 4

  5. Bagging Training Data Do N times Chest Blocked Patient Heart Pain Arteries Weight Disease 1. Shuffle (i.e., bootstrap the data) Yes Yes 205 Yes 2. Train a new decision tree T i No Yes 180 Yes We have {T 1 , T 2 , T 3 , …, T N } Yes No 210 Yes Yes Yes 167 Yes Testing Data No Yes 156 No Chest Blocked Patient Heart Pain Arteries Weight Disease No Yes 125 No No Yes 158 ? Yes No 168 No Yes Yes 172 No CS109A, P ROTOPAPAS , R ADER , T ANNER 5

  6. Bagging Training Data Do N times Chest Blocked Patient Heart Pain Arteries Weight Disease 1. Shuffle (i.e., bootstrap the data) Yes Yes 205 Yes 2. Train a new decision tree T i No Yes 180 Yes We have {T 1 , T 2 , T 3 , …, T N } Yes No 210 Yes Yes Yes 167 Yes Testing Data No Yes 156 No Chest Blocked Patient Heart Pain Arteries Weight Disease No Yes 125 No No Yes 158 ? Yes No 168 No Take a majority vote from {T 1 , T 2 , T 3 , …, T N } Yes Yes 172 No CS109A, P ROTOPAPAS , R ADER , T ANNER 6

  7. Boosting Training Data Chest Blocked Patient Heart Pain Arteries Weight Disease 1. Shuffle (i.e., bootstrap the data) Yes Yes 205 Yes 2. Select a random subset of P i features 3. Train a new decision tree T i No Yes 180 Yes Yes No 210 Yes Yes Yes 167 Yes No Yes 156 No No Yes 125 No Yes No 168 No Yes Yes 172 No CS109A, P ROTOPAPAS , R ADER , T ANNER 7

  8. Boosting Training Data Do N times Chest Blocked Patient Heart Pain Arteries Weight Disease 1. Shuffle (i.e., bootstrap the data) Yes Yes 205 Yes 2. Select a random subset of P i features 3. Train a new decision tree T i No Yes 180 Yes We have {T 1 , T 2 , T 3 , …, T N } Yes No 210 Yes Yes Yes 167 Yes No Yes 156 No No Yes 125 No Yes No 168 No Yes Yes 172 No CS109A, P ROTOPAPAS , R ADER , T ANNER 8

  9. Boosting Training Data Do N times Chest Blocked Patient Heart Pain Arteries Weight Disease 1. Shuffle (i.e., bootstrap the data) Yes Yes 205 Yes 2. Select a random subset of P i features 3. Train a new decision tree T i No Yes 180 Yes We have {T 1 , T 2 , T 3 , …, T N } Yes No 210 Yes Yes Yes 167 Yes Testing Data No Yes 156 No Chest Blocked Patient Heart Pain Arteries Weight Disease No Yes 125 No No Yes 158 ? Yes No 168 No Take a majority vote from {T 1 , T 2 , T 3 , …, T N } Yes Yes 172 No CS109A, P ROTOPAPAS , R ADER , T ANNER 9

  10. Ideas Training Data We have {T 1 , T 2 , T 3 , …, T N } Chest Blocked Patient Heart Pain Arteries Weight Disease Yes Yes 205 Yes “Fool me once, shame on … shame on you. Fool me No Yes 180 Yes – you can’t get fooled again” –George W. Bush Yes No 210 Yes “Fool me once, shame on you; fool me twice, shame Yes Yes 167 Yes on me” –Proverb No Yes 156 No No Yes 125 No Yes No 168 No Yes Yes 172 No CS109A, P ROTOPAPAS , R ADER , T ANNER 10

  11. Ideas Training Data We have {T 1 , T 2 , T 3 , …, T N } Chest Blocked Patient Heart Pain Arteries Weight Disease Yes Yes 205 Yes “Fool me once, shame on … shame on you. Fool me No Yes 180 Yes – you can’t get fooled again” –George W. Bush Yes No 210 Yes “Fool me once, shame on you; fool me twice, shame Yes Yes 167 Yes on me” –Proverb No Yes 156 No Let’s learn from our mistakes! No Yes 125 No Yes No 168 No Yes Yes 172 No CS109A, P ROTOPAPAS , R ADER , T ANNER 11

  12. Gradient Boosting Training Data We have {T 1 , T 2 , T 3 , …, T N } Chest Blocked Patient Heart Pain Arteries Weight Disease Yes Yes 205 Yes No Yes 180 Yes Yes No 210 Yes Yes Yes 167 Yes No Yes 156 No No Yes 125 No Yes No 168 No Yes Yes 172 No CS109A, P ROTOPAPAS , R ADER , T ANNER 12

  13. Gradient Boosting Training Data We have {T 1 , T 2 , T 3 , …, T N } Chest Blocked Patient Heart Pain Arteries Weight Disease Yes Yes 205 Yes No Yes 180 Yes Yes No 210 Yes Each T h is: Yes Yes 167 Yes a ”weak”/simple decision tree • No Yes 156 No built after the previous tree • No Yes 125 No tries to learn the shortcomings (the • Yes No 168 No errors/residuals) from the previous tree’s Yes Yes 172 No predictions CS109A, P ROTOPAPAS , R ADER , T ANNER 13

  14. Gradient Boosting: illustration CS109A, P ROTOPAPAS , R ADER , T ANNER 14

  15. Gradient Boosting: illustration CS109A, P ROTOPAPAS , R ADER , T ANNER 15

  16. Gradient Boosting: illustration CS109A, P ROTOPAPAS , R ADER , T ANNER 16

  17. Gradient Boosting: illustration CS109A, P ROTOPAPAS , R ADER , T ANNER 17

  18. Gradient Boosting: illustration CS109A, P ROTOPAPAS , R ADER , T ANNER 18

  19. Gradient Boosting: illustration CS109A, P ROTOPAPAS , R ADER , T ANNER 19

  20. Gradient Boosting: illustration CS109A, P ROTOPAPAS , R ADER , T ANNER 20

  21. Gradient Boosting: illustration CS109A, P ROTOPAPAS , R ADER , T ANNER 21

  22. Gradient Boosting Training Data We have {T 1 , T 2 , T 3 , …, T N } Chest Blocked Patient Heart Pain Arteries Weight Disease Yes Yes 205 Yes No Yes 180 Yes Yes No 210 Yes We can determine each ! h Yes Yes 167 Yes No Yes 156 No by using gradient descent. No Yes 125 No Yes No 168 No Yes Yes 172 No CS109A, P ROTOPAPAS , R ADER , T ANNER 22

  23. Idea Training Data Chest Blocked Patient Heart If we have categorical data (not a regression Pain Arteries Weight Disease task), we can use AdaBoost Yes Yes 205 Yes No Yes 180 Yes Yes No 210 Yes Yes Yes 167 Yes No Yes 156 No No Yes 125 No Yes No 168 No Yes Yes 172 No CS109A, P ROTOPAPAS , R ADER , T ANNER 23

  24. Idea Training Data Chest Blocked Patient Heart If we have categorical data (not a regression Pain Arteries Weight Disease task), we can use AdaBoost Yes Yes 205 Yes 1. Train a single weak (stump) Decision Tree T i No Yes 180 Yes 2. Calculate the total error of your predictions Yes No 210 Yes 3. Use this error ( ) to determine how much ! i Yes Yes 167 Yes stock to place in that Tree No Yes 156 No 4. Update the weights of each observation No Yes 125 No 5. Update our running model T Yes No 168 No Yes Yes 172 No CS109A, P ROTOPAPAS , R ADER , T ANNER 24

  25. AdaBoost With a minor adjustment to the exponential loss function, we have the algorithm for gradient descent: 1. Choose an initial distribution over the training data, ! " = 1/& . At the i th step, fit a simple classifier T ( i ) on weighted training data 2. { 0 1 , ! 1 3 1 , … , (0 5 , ! 5 3 5 )} 3. Update the weights: where Z is the normalizing constant for the collection of updated weights 4. Update ': ' ← ' + + (-) ' (-) where + is the learning rate. CS109A, P ROTOPAPAS , R ADER , T ANNER 25

  26. AdaBoost: start with equal weights CS109A, P ROTOPAPAS , R ADER , T ANNER 26

  27. AdaBoost: fit a simple decision tree CS109A, P ROTOPAPAS , R ADER , T ANNER 27

  28. AdaBoost: update the weights CS109A, P ROTOPAPAS , R ADER , T ANNER 28

  29. AdaBoost: fit another simple decision tree on re-weighted data CS109A, P ROTOPAPAS , R ADER , T ANNER 29

  30. AdaBoost: add the new model to the ensemble ! ← ! + $ (&) ! (&) CS109A, P ROTOPAPAS , R ADER , T ANNER 30

  31. AdaBoost: update the weights CS109A, P ROTOPAPAS , R ADER , T ANNER 31

  32. AdaBoost: fit a third, simple decision tree on re-weighted data CS109A, P ROTOPAPAS , R ADER , T ANNER 32

  33. AdaBoost: add the new model to the ensemble, repeat … ! ← ! + $ (&) ! (&) CS109A, P ROTOPAPAS , R ADER , T ANNER 33

  34. Choosing the Learning Rate Unlike in the case of gradient boosting for regression, we can analytically solve for the optimal learning rate for AdaBoost, by optimizing: Doing so, we get that CS109A, P ROTOPAPAS , R ADER , T ANNER 34

Recommend


More recommend