CISC 4631 Data Mining Lecture 04: • Decision Trees Theses slides are based on the slides by • Tan, Steinbach and Kumar (textbook authors) • Eamonn Koegh (UC Riverside) • Raymond Mooney (UT Austin) 1
Classification: Definition • Given a collection of records ( training set ) – Each record contains a set of attributes , one of the attributes is the class . • Find a model for class attribute as a function of the values of other attributes. • Goal: previously unseen records should be assigned a class as accurately as possible. – A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it. 2
Illustrating Classification Task Learning Tid Attrib1 Attrib2 Attrib3 Class algorithm 1 Yes Large 125K No 2 No Medium 100K No 3 No Small 70K No 4 Yes Medium 120K No Induction Yes 5 No Large 95K 6 No Medium 60K No Learn 7 Yes Large 220K No Model 8 No Small 85K Yes 9 No Medium 75K No Yes 10 No Small 90K Model 10 Training Set Apply Model Tid Attrib1 Attrib2 Attrib3 Class 11 No Small 55K ? 12 Yes Medium 80K ? Deduction 13 Yes Large 110K ? 14 No Small 95K ? ? 15 No Large 67K 10 Test Set 3
Classification Techniques • Decision Tree based Methods • Rule-based Methods • Memory based reasoning • Neural Networks • Naïve Bayes and Bayesian Belief Networks • Support Vector Machines 4
Example of a Decision Tree Splitting Attributes Tid Refund Marital Taxable Cheat Status Income 1 Yes Single 125K No Refund 2 No Married 100K No Yes No 3 No Single 70K No 4 Yes Married 120K No NO MarSt 5 No Divorced 95K Yes Married Single, Divorced 6 No Married 60K No TaxInc NO No 7 Yes Divorced 220K < 80K > 80K Yes 8 No Single 85K 9 No Married 75K No YES NO 10 No Single 90K Yes 10 Model: Decision Tree Training Data 5
Another Example of Decision Tree Single, MarSt Married Divorced Tid Refund Marital Taxable Cheat Status Income NO Refund 1 Yes Single 125K No No Yes 2 No Married 100K No NO TaxInc 3 No Single 70K No 4 Yes Married 120K No < 80K > 80K 5 No Divorced 95K Yes NO YES 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes There could be more than one tree that fits No 9 No Married 75K the same data! 10 No Single 90K Yes 10 6
Decision Tree Classification Task Tree Tid Attrib1 Attrib2 Attrib3 Class Induction 1 Yes Large 125K No algorithm 2 No Medium 100K No 3 No Small 70K No 4 Yes Medium 120K No Induction Yes 5 No Large 95K 6 No Medium 60K No Learn 7 Yes Large 220K No Model 8 No Small 85K Yes 9 No Medium 75K No Yes 10 No Small 90K Model 10 Training Set Apply Decision Model Tree Tid Attrib1 Attrib2 Attrib3 Class 11 No Small 55K ? 12 Yes Medium 80K ? Deduction 13 Yes Large 110K ? ? 14 No Small 95K 15 No Large 67K ? 10 Test Set 7
Apply Model to Test Data Test Data Start from the root of tree. Refund Marital Taxable Cheat Status Income No Married 80K ? Refund 10 Yes No NO MarSt Married Single, Divorced TaxInc NO < 80K > 80K YES NO 8
Apply Model to Test Data Test Data Refund Marital Taxable Cheat Status Income No Married 80K ? Refund 10 Yes No NO MarSt Married Single, Divorced TaxInc NO < 80K > 80K YES NO 9
Apply Model to Test Data Test Data Refund Marital Taxable Cheat Status Income No Married 80K ? Refund 10 Yes No NO MarSt Married Single, Divorced TaxInc NO < 80K > 80K YES NO 10
Apply Model to Test Data Test Data Refund Marital Taxable Cheat Status Income No Married 80K ? Refund 10 Yes No NO MarSt Married Single, Divorced TaxInc NO < 80K > 80K YES NO 11
Apply Model to Test Data Test Data Refund Marital Taxable Cheat Status Income No Married 80K ? Refund 10 Yes No NO MarSt Married Single, Divorced TaxInc NO < 80K > 80K YES NO 12
Apply Model to Test Data Test Data Refund Marital Taxable Cheat Status Income No Married 80K ? Refund 10 Yes No NO MarSt Assign Cheat to “No” Married Single, Divorced TaxInc NO < 80K > 80K YES NO 13
Decision Tree Terminology 14
Decision Tree Classification Task Tree Attrib1 Attrib2 Attrib3 Class Tid Induction 1 Yes Large 125K No algorithm 2 No Medium 100K No 3 No Small 70K No Induction 4 Yes Medium 120K No 5 No Large 95K Yes 6 No Medium 60K No Learn No 7 Yes Large 220K Model 8 No Small 85K Yes 9 No Medium 75K No 10 No Small 90K Yes Model 10 Training Set Apply Decision Model Tree Attrib1 Attrib2 Attrib3 Class Tid 11 No Small 55K ? 12 Yes Medium 80K ? Deduction 13 Yes Large 110K ? 14 No Small 95K ? 15 No Large 67K ? 10 Test Set 15
Decision Tree Induction • Many Algorithms: – Hunt’s Algorithm (one of the earliest) – CART – ID3, C4.5 – SLIQ,SPRINT • John Ross Quinlan is a computer science researcher in data mining and decision theory. He has contributed extensively to the development of decision tree algorithms, including inventing the canonical C4.5 and ID3 algorithms. 16
Decision Tree Classifier 10 Ross Quinlan 9 8 7 Antenna Length Antenna Length Abdomen Length > 7.1? Abdomen Length 6 5 yes no 4 Antenna Length > 6.0? Katydid Antenna Length 3 2 yes no 1 Katydid Grasshopper 1 2 3 4 5 6 7 8 9 10 17 Abdomen Length Abdomen Length
Antennae shorter than body? Yes No 3 Tarsi? Grasshopper Yes No Foretiba has ears? Yes No Cricket Decision trees predate computers Katydids Camel Cricket 18
Definition Decision tree is a classifier in the form of a tree structure – Decision node: specifies a test on a single attribute – Leaf node: indicates the value of the target attribute – Arc/edge: split of one attribute – Path: a disjunction of test to make the final decision Decision trees classify instances or examples by starting at the root of the tree and moving through it until a leaf node. 19
Decision Tree Classification • Decision tree generation consists of two phases – Tree construction • At start, all the training examples are at the root • Partition examples recursively based on selected attributes – Tree pruning • Identify and remove branches that reflect noise or outliers • Use of decision tree: Classifying an unknown sample – Test the attribute values of the sample against the decision tree 20
Decision Tree Representation • Each internal node tests an attribute • Each branch corresponds to attribute value • Each leaf node assigns a classification outlook sunny overcast rain humidity wind yes weak normal strong high no yes no yes 21
How do we construct the decision tree? • Basic algorithm (a greedy algorithm) – Tree is constructed in a top-down recursive divide-and-conquer manner – At start, all the training examples are at the root – Attributes are categorical (if continuous-valued, they can be discretized in advance) – Examples are partitioned recursively based on selected attributes. – Test attributes are selected on the basis of a heuristic or statistical measure (e.g., information gain) • Conditions for stopping partitioning – All samples for a given node belong to the same class – There are no remaining attributes for further partitioning – majority voting is employed for classifying the leaf – There are no samples left 22
Top-Down Decision Tree Induction • Main loop: A the “best” decision attribute for next node 1. 2. Assign A as decision attribute for node 3. For each value of A, create new descendant of node 4. Sort training examples to leaf nodes 5. If training examples perfectly classified, Then STOP, Else iterate over new leaf nodes 23
Tree Induction • Greedy strategy. – Split the records based on an attribute test that optimizes certain criterion. • Issues – Determine how to split the records • How to specify the attribute test condition? • How to determine the best split? – Determine when to stop splitting 24
How To Split Records • Random Split – The tree can grow huge – These trees are hard to understand. – Larger trees are typically less accurate than smaller trees. • Principled Criterion – Selection of an attribute to test at each node - choosing the most useful attribute for classifying examples. – How ? – Information gain • measures how well a given attribute separates the training examples according to their target classification • This measure is used to select among the candidate attributes at each step while growing the tree 25
Tree Induction • Greedy strategy: – Split the records based on an attribute test that optimizes certain criterion: – Hunt’s algorithm: recursively partition training records into successively purer subsets. How to measure purity/impurity • Entropy and information gain (covered in the lectures slides) • Gini (covered in the textbook) • Classification error 26
Recommend
More recommend