extending decision trees
play

Extending Decision Trees Alice Gao Lecture 10 Based on work by K. - PowerPoint PPT Presentation

1/20 Extending Decision Trees Alice Gao Lecture 10 Based on work by K. Leyton-Brown, K. Larson, and P. van Beek 2/20 Outline Learning Goals Real-valued features Noise and over-fjtting Revisiting the Learning goals 3/20 Learning Goals By


  1. 1/20 Extending Decision Trees Alice Gao Lecture 10 Based on work by K. Leyton-Brown, K. Larson, and P. van Beek

  2. 2/20 Outline Learning Goals Real-valued features Noise and over-fjtting Revisiting the Learning goals

  3. 3/20 Learning Goals By the end of the lecture, you should be able to cross-validation. ▶ Construct decision trees with real-valued features. ▶ Construct a decision tree for noisy data to avoid over-fjtting. ▶ Choose the best maximum depth of a decision tree by K -fold

  4. 4/20 Normal Yes Weak Normal Mild Rain 10 Yes Weak Cool Sunny Sunny 9 No Weak High Mild Sunny 8 Yes 11 Mild Normal Hot Strong High Mild Rain 14 Yes Weak Normal Overcast Normal 13 Yes Strong High Mild Overcast 12 Yes Strong Strong Cool Jeeves the valet - training set Weak 3 No Strong High Hot Sunny 2 No High Hot Hot Sunny 1 Tennis? Wind Humidity Temp Outlook Day Overcast High Overcast Normal 7 No Strong Normal Cool Rain 6 Yes Weak Cool Weak Rain 5 Yes Weak High Mild Rain 4 Yes No

  5. 5/20 High No Strong Normal Mild Rain 10 Yes Weak Cool Overcast Rain 9 Yes Weak High Cool Overcast 8 Yes 11 Mild Normal Cool Weak High Cool Sunny 14 No Strong High Sunny High 13 Yes Weak Normal Mild Sunny 12 Yes Weak Weak Mild Jeeves the valet - test set Strong 3 No Strong Normal Hot Rain 2 No High Cool Mild Sunny 1 Tennis? Wind Humidity Temp Outlook Day Rain High Overcast Normal 7 Yes Weak High Hot Rain 6 Yes Weak Cool Strong Overcast 5 Yes Strong High Hot Overcast 4 No No

  6. 6/20 Extending Decision Trees 1. Real-valued features 2. Noise and over-fjtting

  7. 7/20 Normal Yes Weak Normal 23.9 Rain 10 Yes Weak 20.6 Sunny Sunny 9 No Weak High 22.2 Sunny 8 Yes 11 23.9 Normal 27.2 Strong High 21.7 Rain 14 Yes Weak Normal Overcast Normal 13 Yes Strong High 22.2 Overcast 12 Yes Strong Strong 17.7 Jeeves dataset with real-valued temperatures Weak 3 No Strong High 26.6 Sunny 2 No High 28.3 29.4 Sunny 1 Tennis? Wind Humidity Temp Outlook Day Overcast High Overcast Normal 7 No Strong Normal 18.3 Rain 6 Yes Weak 20.0 Weak Rain 5 Yes Weak High 21.1 Rain 4 Yes No

  8. 8/20 Normal Yes Strong Normal 23.9 Sunny 11 Yes Weak 23.9 Sunny Rain 10 Yes Strong High 22.2 Overcast 12 No 2 26.6 High 28.3 Weak High 29.4 Sunny 1 Yes Weak High Overcast High 3 Yes Weak Normal 27.2 Overcast 13 No Strong Weak 22.2 Jeeves dataset ordered by temperatures Strong 5 No Strong Normal 18.3 Rain 6 Yes Normal 20.0 17.7 Overcast 7 Tennis? Wind Humidity Temp Outlook Day Rain Normal Sunny High 8 No Strong High 21.7 Rain 14 Yes Weak 21.1 Weak Rain 4 Yes Weak Normal 20.6 Sunny 9 Yes No

  9. 9/20 Handling a real-valued feature ▶ Discretize it. ▶ Dynamically choose a split point.

  10. 10/20 Choosing a split point for a real-valued feature 2. Possible split points are values that are midway between two difgerent values. 3. Suppose that the feature changes from X to Y. Should we takes the value X . takes the value Y . point. 7. Determine the expected information gain for each possible split point and choose the split point with the largest gain. 1. Sort the instances according to the real-valued feature consider ( X + Y ) / 2 as a possible split point? 4. Let L X be all the labels for the examples where the feature 5. Let L Y be all the labels for the examples where the feature 6. If there exists a label a ∈ L X and a label b ∈ L Y such that a ̸ = b , then we will consider ( X + Y ) / 2 as a possible split

  11. 11/20 CQ: Testing a discrete feature CQ: Suppose that feature X has discrete values (e.g. Temp is Cool, Mild, or Hot.) On any path from the root to a leaf, how many times can we test feature X ? (A) 0 times (B) 1 time (D) Two of (A), (B), and (C) are correct. (E) All of (A), (B), and (C) are correct. (C) > 1 time

  12. 12/20 CQ: Testing a real-valued feature CQ: Assume that we will do binary tests at each node in a decision tree. Suppose that feature X has real values (e.g. Temp ranges from 17.7 to 29.4.) On any path from the root to a leaf, how many times can we test feature X ? (A) 0 times (B) 1 time (D) Two of (A), (B), and (C) are correct. (E) All of (A), (B), and (C) are correct. (C) > 1 time

  13. 13/20 Normal Yes Weak Normal Mild Rain 10 Yes Weak Cool Sunny Sunny 9 No Weak High Mild Sunny 8 Yes 11 Mild Normal Hot Strong High Mild Rain 14 Yes Weak Normal Overcast Normal 13 Yes Strong High Mild Overcast 12 Yes Strong Strong Cool Jeeves the valet - training set Weak 3 No Strong High Hot Sunny 2 No High Hot Hot Sunny 1 Tennis? Wind Humidity Temp Outlook Day Overcast High Overcast Normal 7 No Strong Normal Cool Rain 6 Yes Weak Cool Weak Rain 5 Yes Weak High Mild Rain 4 Yes No

  14. 14/20 No Rain Overcast High Normal Sunny Yes Decision tree generated by ID3 Wind Yes No Yes Humidity Outlook Test error is 0/14. Weak Strong

  15. 15/20 Normal Yes Weak Normal Mild Rain 10 Yes Weak Cool Sunny Sunny 9 No Weak High Mild Sunny 8 Yes 11 Mild Normal Hot Strong High Mild Rain 14 Yes Weak Normal Overcast Normal 13 Yes Strong High Mild Overcast 12 Yes Strong Strong Cool Jeeves training set is corrupted Weak 3 No Strong High Hot Sunny 2 No High Hot Hot Sunny 1 Tennis? Wind Humidity Temp Outlook Day Overcast High Overcast Normal 7 No Strong Normal Cool Rain 6 Yes Weak Cool Weak Rain 5 Yes Weak High Mild Rain 4 No No

  16. 16/20 Sunny Strong Weak Rain Strong Weak High Normal Overcast High Normal No Decision tree for the corrupted data set Yes Wind Yes No Wind Yes Humidity No Yes Humidity Outlook Test error is 2/14.

  17. 17/20 Dealing with noisy data Problem: When the data is noisy, the ID3 algorithm grows the tree until the tree perfectly classifjes the training examples. Over-fjtting occurs. However, a smaller tree is likely to generalize to unseen data better. ▶ Grow the tree to a pre-specifjed maximum depth. ▶ Enforce a minimum number of examples at a leaf node. ▶ Post-prune the tree using a validation set.

  18. 18/20 Growing the tree to a maximum depth validation set. (For example, 2/3 is the training set and 1/3 is the validation set.) the maximum depth on the training set. validation set. highest prediction accuracy. ▶ Randomly split the entire dataset into a training set and a ▶ For each pre-specifjed maximum depth, generate a tree with ▶ Calculate the prediction accuracy of the generated tree on the ▶ Choose the maximum depth which results in the tree with the

  19. 19/20 K-fold cross-validation 1. For each pre-specifjed maximum depth, do steps 2 to 6. 2. Split the data into 5 equal subsets. 3. Perform 5 rounds of learning. 5. Over the 5 rounds, generate 5 difgerent trees and determine their prediction accuracies on the 5 difgerent data sets. 6. Calculate the average prediction accuracy on the validation sets. 7. Choose the maximum depth that results in the highest prediction accuracy on the validation sets. Suppose that K = 5. 4. In each round, 1 / 5 of the data is used as the validation set and 4 / 5 of the data is used as the training set.

  20. 20/20 Revisiting the Learning Goals By the end of the lecture, you should be able to cross-validation. ▶ Construct decision trees with real-valued features. ▶ Construct a decision tree for noisy data to avoid over-fjtting. ▶ Choose the best maximum depth of a decision tree by K -fold

Recommend


More recommend