12/1/2011 Emil Brissman & Kajsa Eriksson 2011-12-07 1 Agenda Background Techniques Example Applications Summary 2 1
12/1/2011 Background: The problem Decision trees: Need to have low prediction error Should not be over fitted to training data Should generally have low complexity, for interpretation purposes 3 Background: Existing solutions Pruning: Post-process to reduce over fitting and decision tree complexity Risk of the tree becoming under fitted Boosting: Reduces prediction error by applying a series of separate classifications and then combines them Complexity of the tree increases drastically 4 2
12/1/2011 Background: Grafting To prove that a more complex tree could have a lower prediction error without being over fitted to training data The idea is to reclassify regions of the instance space without training data or with just misclassified data Reclassification results in a higher probability of rightly classifying data that fall into empty regions 5 Techniques (1/2) There are four algorithms for grafting that are built upon each other They are made as a post process for the C4.5 classification technique C4.5X is the first algorithm which was developed just to test the theory of grafting C4.5+ is a formal grafting algorithm developed because of the success of C4.5X 6 3
12/1/2011 Techniques (2/2) C4.5++ is a further development that is proved to not produce over fitting in the tree. In other words it balances the bias and variance of the tree. C4.5A is the fourth and final algorithm which is a performance update from previous ones. By considering a smaller set of data the computational time is reduced. 7 Example (1/4) Classification of instance space after C4.5 algorithm: A A > 7 A <= 7 ◊ A A > 2 A <= 2 * B B <= 5 B > 5 * 8 4
12/1/2011 Example (2/4) The blue region is a leaf in the deduced decision tree that C4.5 classified as *. But what is really the most likely class for the area marked with ? By applying grafting as a post process a new prediction for the area can be made 9 Example (3/4) Step 1: For each leaf the algorithm visits all ancestor nodes. It tries to find possible cuts that split the leaf region. Step 2: It chooses those cuts that have the highest Laplacian accuracy estimate Laplace: (P+1) / (T+2) ○ T – number of instances below certain ancestor ○ P – number of instances of majority class below same ancestor 10 5
12/1/2011 Example (4/4) Step 3: The best supported cuts are introduced in the decision tree as new branches and leaves with a more likely class Result: 3 new leaves a - ◊ b - * c - ◊ The region with the ? now belongs to class ◊ 11 Applications Grafting as a post-process to C4.5 is implemented in Weka as J48graft 12 6
12/1/2011 Summary Grafting is a post-process that successfully reduces the prediction error of a decision tree by re-evaluating areas of the instance space where no training data exists It is proved that the increased complexity of a grafted tree does not mean that the tree is more over fitted Grafting together with pruning most often gives even better results. Probably because the algorithms complement each other. 13 Bibliography Kumar, V., Steinbach, M. & Tan, P.-N. (2006). Introduction to Data Mining . Pearson College Div. Quinlan, J. R. (1993). C4.5: Programs for Machine Learning . Los Altos: Morgan Kaufmann. University of Waikato. Weka 3: Data Mining Software in Java . http://www.cs.waikato.ac.nz/ml/weka/index.html [2011-12-01] Webb, G.I. (1996). Further Experimental Evidence against the Utility of Occam's Razor . Journal of Artificial Intelligence Research, vol. 4, pp. 397-417. Webb, G.I (1997 ). Decision Tree Grafting . Learning, IJCAI’97 Proceedings of the Fifteenth international joint conference on Artificial intelligence, vol. 2, pp. 846-85. Webb, G.I. (1999). Decision Tree Grafting From the All-Test- But-One Partition . Machine Learning, IJCAI '99 Proceedings of the Sixteenth international joint conference on Artificial intelligence, vol. 2, pp. 702-707. 14 7
Recommend
More recommend