Lab #10: Demonstration of AdaBoost CS109A Introduction to Data Science Pavlos Protopapas, Kevin Rader, and Chris Tanner 1
Our Data Training Data Chest Blocked Patient Heart Pain Arteries Weight Disease Yes Yes 205 Yes No Yes 180 Yes Yes No 210 Yes Yes Yes 167 Yes No Yes 156 No No Yes 125 No Yes No 168 No Yes Yes 172 No CS109A, P ROTOPAPAS , R ADER , T ANNER 2
Bagging Training Data Chest Blocked Patient Heart Pain Arteries Weight Disease 1. Shuffle (i.e., bootstrap the data) Yes Yes 205 Yes 2. Train a new decision tree T i No Yes 180 Yes Yes No 210 Yes Yes Yes 167 Yes No Yes 156 No No Yes 125 No Yes No 168 No Yes Yes 172 No CS109A, P ROTOPAPAS , R ADER , T ANNER 3
Bagging Training Data Do N times Chest Blocked Patient Heart Pain Arteries Weight Disease 1. Shuffle (i.e., bootstrap the data) Yes Yes 205 Yes 2. Train a new decision tree T i No Yes 180 Yes We have {T 1 , T 2 , T 3 , …, T N } Yes No 210 Yes Yes Yes 167 Yes No Yes 156 No No Yes 125 No Yes No 168 No Yes Yes 172 No CS109A, P ROTOPAPAS , R ADER , T ANNER 4
Bagging Training Data Do N times Chest Blocked Patient Heart Pain Arteries Weight Disease 1. Shuffle (i.e., bootstrap the data) Yes Yes 205 Yes 2. Train a new decision tree T i No Yes 180 Yes We have {T 1 , T 2 , T 3 , …, T N } Yes No 210 Yes Yes Yes 167 Yes Testing Data No Yes 156 No Chest Blocked Patient Heart Pain Arteries Weight Disease No Yes 125 No No Yes 158 ? Yes No 168 No Yes Yes 172 No CS109A, P ROTOPAPAS , R ADER , T ANNER 5
Bagging Training Data Do N times Chest Blocked Patient Heart Pain Arteries Weight Disease 1. Shuffle (i.e., bootstrap the data) Yes Yes 205 Yes 2. Train a new decision tree T i No Yes 180 Yes We have {T 1 , T 2 , T 3 , …, T N } Yes No 210 Yes Yes Yes 167 Yes Testing Data No Yes 156 No Chest Blocked Patient Heart Pain Arteries Weight Disease No Yes 125 No No Yes 158 ? Yes No 168 No Take a majority vote from {T 1 , T 2 , T 3 , …, T N } Yes Yes 172 No CS109A, P ROTOPAPAS , R ADER , T ANNER 6
Boosting Training Data Chest Blocked Patient Heart Pain Arteries Weight Disease 1. Shuffle (i.e., bootstrap the data) Yes Yes 205 Yes 2. Select a random subset of P i features 3. Train a new decision tree T i No Yes 180 Yes Yes No 210 Yes Yes Yes 167 Yes No Yes 156 No No Yes 125 No Yes No 168 No Yes Yes 172 No CS109A, P ROTOPAPAS , R ADER , T ANNER 7
Boosting Training Data Do N times Chest Blocked Patient Heart Pain Arteries Weight Disease 1. Shuffle (i.e., bootstrap the data) Yes Yes 205 Yes 2. Select a random subset of P i features 3. Train a new decision tree T i No Yes 180 Yes We have {T 1 , T 2 , T 3 , …, T N } Yes No 210 Yes Yes Yes 167 Yes No Yes 156 No No Yes 125 No Yes No 168 No Yes Yes 172 No CS109A, P ROTOPAPAS , R ADER , T ANNER 8
Boosting Training Data Do N times Chest Blocked Patient Heart Pain Arteries Weight Disease 1. Shuffle (i.e., bootstrap the data) Yes Yes 205 Yes 2. Select a random subset of P i features 3. Train a new decision tree T i No Yes 180 Yes We have {T 1 , T 2 , T 3 , …, T N } Yes No 210 Yes Yes Yes 167 Yes Testing Data No Yes 156 No Chest Blocked Patient Heart Pain Arteries Weight Disease No Yes 125 No No Yes 158 ? Yes No 168 No Take a majority vote from {T 1 , T 2 , T 3 , …, T N } Yes Yes 172 No CS109A, P ROTOPAPAS , R ADER , T ANNER 9
Ideas Training Data We have {T 1 , T 2 , T 3 , …, T N } Chest Blocked Patient Heart Pain Arteries Weight Disease Yes Yes 205 Yes “Fool me once, shame on … shame on you. Fool me No Yes 180 Yes – you can’t get fooled again” –George W. Bush Yes No 210 Yes “Fool me once, shame on you; fool me twice, shame Yes Yes 167 Yes on me” –Proverb No Yes 156 No No Yes 125 No Yes No 168 No Yes Yes 172 No CS109A, P ROTOPAPAS , R ADER , T ANNER 10
Ideas Training Data We have {T 1 , T 2 , T 3 , …, T N } Chest Blocked Patient Heart Pain Arteries Weight Disease Yes Yes 205 Yes “Fool me once, shame on … shame on you. Fool me No Yes 180 Yes – you can’t get fooled again” –George W. Bush Yes No 210 Yes “Fool me once, shame on you; fool me twice, shame Yes Yes 167 Yes on me” –Proverb No Yes 156 No Let’s learn from our mistakes! No Yes 125 No Yes No 168 No Yes Yes 172 No CS109A, P ROTOPAPAS , R ADER , T ANNER 11
Gradient Boosting Training Data We have {T 1 , T 2 , T 3 , …, T N } Chest Blocked Patient Heart Pain Arteries Weight Disease Yes Yes 205 Yes No Yes 180 Yes Yes No 210 Yes Yes Yes 167 Yes No Yes 156 No No Yes 125 No Yes No 168 No Yes Yes 172 No CS109A, P ROTOPAPAS , R ADER , T ANNER 12
Gradient Boosting Training Data We have {T 1 , T 2 , T 3 , …, T N } Chest Blocked Patient Heart Pain Arteries Weight Disease Yes Yes 205 Yes No Yes 180 Yes Yes No 210 Yes Each T h is: Yes Yes 167 Yes a ”weak”/simple decision tree • No Yes 156 No built after the previous tree • No Yes 125 No tries to learn the shortcomings (the • Yes No 168 No errors/residuals) from the previous tree’s Yes Yes 172 No predictions CS109A, P ROTOPAPAS , R ADER , T ANNER 13
Gradient Boosting: illustration CS109A, P ROTOPAPAS , R ADER , T ANNER 14
Gradient Boosting: illustration CS109A, P ROTOPAPAS , R ADER , T ANNER 15
Gradient Boosting: illustration CS109A, P ROTOPAPAS , R ADER , T ANNER 16
Gradient Boosting: illustration CS109A, P ROTOPAPAS , R ADER , T ANNER 17
Gradient Boosting: illustration CS109A, P ROTOPAPAS , R ADER , T ANNER 18
Gradient Boosting: illustration CS109A, P ROTOPAPAS , R ADER , T ANNER 19
Gradient Boosting: illustration CS109A, P ROTOPAPAS , R ADER , T ANNER 20
Gradient Boosting: illustration CS109A, P ROTOPAPAS , R ADER , T ANNER 21
Gradient Boosting Training Data We have {T 1 , T 2 , T 3 , …, T N } Chest Blocked Patient Heart Pain Arteries Weight Disease Yes Yes 205 Yes No Yes 180 Yes Yes No 210 Yes We can determine each ! h Yes Yes 167 Yes No Yes 156 No by using gradient descent. No Yes 125 No Yes No 168 No Yes Yes 172 No CS109A, P ROTOPAPAS , R ADER , T ANNER 22
Idea Training Data Chest Blocked Patient Heart If we have categorical data (not a regression Pain Arteries Weight Disease task), we can use AdaBoost Yes Yes 205 Yes No Yes 180 Yes Yes No 210 Yes Yes Yes 167 Yes No Yes 156 No No Yes 125 No Yes No 168 No Yes Yes 172 No CS109A, P ROTOPAPAS , R ADER , T ANNER 23
Idea Training Data Chest Blocked Patient Heart If we have categorical data (not a regression Pain Arteries Weight Disease task), we can use AdaBoost Yes Yes 205 Yes 1. Train a single weak (stump) Decision Tree T i No Yes 180 Yes 2. Calculate the total error of your predictions Yes No 210 Yes 3. Use this error ( ) to determine how much ! i Yes Yes 167 Yes stock to place in that Tree No Yes 156 No 4. Update the weights of each observation No Yes 125 No 5. Update our running model T Yes No 168 No Yes Yes 172 No CS109A, P ROTOPAPAS , R ADER , T ANNER 24
AdaBoost With a minor adjustment to the exponential loss function, we have the algorithm for gradient descent: 1. Choose an initial distribution over the training data, ! " = 1/& . At the i th step, fit a simple classifier T ( i ) on weighted training data 2. { 0 1 , ! 1 3 1 , … , (0 5 , ! 5 3 5 )} 3. Update the weights: where Z is the normalizing constant for the collection of updated weights 4. Update ': ' ← ' + + (-) ' (-) where + is the learning rate. CS109A, P ROTOPAPAS , R ADER , T ANNER 25
AdaBoost: start with equal weights CS109A, P ROTOPAPAS , R ADER , T ANNER 26
AdaBoost: fit a simple decision tree CS109A, P ROTOPAPAS , R ADER , T ANNER 27
AdaBoost: update the weights CS109A, P ROTOPAPAS , R ADER , T ANNER 28
AdaBoost: fit another simple decision tree on re-weighted data CS109A, P ROTOPAPAS , R ADER , T ANNER 29
AdaBoost: add the new model to the ensemble ! ← ! + $ (&) ! (&) CS109A, P ROTOPAPAS , R ADER , T ANNER 30
AdaBoost: update the weights CS109A, P ROTOPAPAS , R ADER , T ANNER 31
AdaBoost: fit a third, simple decision tree on re-weighted data CS109A, P ROTOPAPAS , R ADER , T ANNER 32
AdaBoost: add the new model to the ensemble, repeat … ! ← ! + $ (&) ! (&) CS109A, P ROTOPAPAS , R ADER , T ANNER 33
Choosing the Learning Rate Unlike in the case of gradient boosting for regression, we can analytically solve for the optimal learning rate for AdaBoost, by optimizing: Doing so, we get that CS109A, P ROTOPAPAS , R ADER , T ANNER 34
Recommend
More recommend