Introduction to Artificial Intelligence Classification Algorithms Decision Trees and Overfitting Mi ł osz Kadzi ń ski Institute of Computing Science Poznan University of Technology, Poland www.cs.put.poznan.pl/mkadzinski/iai Artificial Intelligence Introduction to Artificial Intelligence
Classification Input A fixed set of class or category labels C = { C 1 , C 2 ,…, C n } A training set of m hand-labeled objects {( O 1 , C 2 ),....,( O m , C j )} An object O x ∈ X to be classified Output A learned classifier predicts class c( O x ) for O x , where c ( O x ) ∈ C A learned classifier predicts class c( O x ) for O x , where c ( O x ) ∈ C and c is a function whose domain is X and whose range is C Aim Create a model and use it to classify new data (i.e., predict a discrete, nominal value (category/class)) Classification algorithms Bayesian Decision trees Distance-based Neural networks Genetic Association-based Support Vector Machines Artificial Intelligence Introduction to Artificial Intelligence
Examples of Classification (1) Classification: predicting class as a function of the values of other attributes Credit risk assessment and fraud detection A credit card company typically receives hundreds of thousands of application for new card. The application contains information regarding several different attributes , such annual salary, any outstanding debts, age, etc. The problem is to categorize application into those who have good credit, bad credit, or fall into a gray area . Medical diagnosis and treatment effectiveness An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc.) of newly admitted patients. A decision has to be taken whether to put the patient in an intensive-care unit . Due to high cost of ICU, we need to predict high-risk patients and discriminate them from low-risk patients. Artificial Intelligence Introduction to Artificial Intelligence
Examples of Classification (2) Classification: predicting class as a function of the values of other attributes Industrial applications A nuclear fuel processing company wishes to improve the yield of its factories. In one such factory, uranium haxafloride gas is converted into uranium-dioxide pellets. Six processing steps are needed to do the conversion. There are 30 controllable variables (e.g., pressure and There are 30 controllable variables (e.g., pressure and flow rates, temperature). Engineers note that yield is high on some days and low on others. How can they control the variable to produce high yield on all days? Star classification Astronomers have been cataloguing distant objects in the sky using long-exposure images. The objects need to be labeled as star, galaxy, nebule, etc . The data is highly noisy, and the images are very faint. The cataloguing can take decades to complete. How can psysicists automate the cataloguing process , and improve its effectiveness? Artificial Intelligence Introduction to Artificial Intelligence
Examples of Classification (3) – Our Running Example Attributes / characteristics / features Decision attribute Attributes / characteristics / features Decision attribute (classification; class = yes or class = no) Income AI student Sex Buy iPhone XI? Objects / instances / items 1 medium yes male yes 2 medium yes female yes 3 high yes female yes 4 low yes male no 5 low yes female no 6 low no female no 7 medium no male no 8 medium no female no if income = medium and AI stud = yes and sex = male then yes Aim : represent the classification with a decision tree Artificial Intelligence Introduction to Artificial Intelligence
Decision Tree – How to Interpret It? Directed tree = directed acyclic graph who underlying undirected graph is a tree (graph in which any two vertices are connected by exactly one path) I AI S iPh? (tests an 1 M Y M Y vertices root attribute) 2 M Y F Y income internal node 3 H Y F Y branch low high medium 4 L Y M (attribute value, N N Y AI stud. leads to a node at level 5 L Y F N the lower level) yes no 6 L N F N leaf (node) Y 7 M N M N N (assigns a classification) 8 M N F N Classification tree needs to represent a division of the set of objects into classes Internal nodes = means of performing such a division; leaves = classes Artificial Intelligence Introduction to Artificial Intelligence
Decision Tree vs. Decision Rules Decision tree can be alternatively represented in form of decision rules Each path leading from the root to some leaf corresponds to a rule Each path leading from the root to some leaf corresponds to a rule if income = low then buy = no income if income = high then buy = yes low medium high if ( income = medium and AI student= yes ) then buy = yes N Y AI stud. if ( income = medium and AI student = no ) yes no then buy = no Y N For each class decision tree represents a disjunction ( ∨ ) of conjunctions ( ∧ ) on constraints on the value of attributes: yes (Y) : ( income = high ) ∨ ( income = medium ∧ AI student = yes ) no (N) : ( income = low ) ∨ ( income = medium ∧ AI student = no ) Both rules and con/disjuctions can possibly be simplified (see no (N) ) Artificial Intelligence Introduction to Artificial Intelligence
Decision Tree - Use for Classification Classification of new objects with unknown classification by traversing the tree Start from the root Verify value of the attribute related with the current node Move to the next node following the branch corresponding to the object’s attribute value Repeat until reach a leaf indicating a class assignment START HERE income low medium EXAMPLE NEW OBJECTS high I AI S iPh? Y AI stud. N Y 9 H N M ? yes {4,5,6} {3} no N 10 M N F ? Y N {1,2} {7,8} In classifying any object, the tree may not use all attributes in the table Some attributes (see sex (S)) may not have any influence on making the buying decision (according to the constructed decision tree) the buying decision (according to the constructed decision tree) Artificial Intelligence Introduction to Artificial Intelligence
Decision Tree – Characteristics Decision Trees is one of the most widely used and practical methods of inference based on decision examples inference based on decision examples Instances can be described by attribute value pairs Learned functions are represented as decision trees (or if-then-else rules ) Expressive hypotheses space, including disjunction ( disjunctive hypothesis ) Possibly noisy training data samples: robust to errors in training data Intuitive, easily understandable, human readable Sequence of conditions conditioning decision making is used in many domains The most famous Decision Trees algorithms ID3 (Iterative Dichotomiser 3) C4 and C4.5 (successors of ID3) CART (Classification and Regression Tree) Artificial Intelligence Introduction to Artificial Intelligence
ID3 – The Famous Algorithm for Learning Decision Trees ID3 is a basic algorithm for learning decision trees (DTs) Given a training set of examples, the algorithm performs search in the space of decision trees The construction of the tree is top-down (root → leaves) John Ross Quinlan The algorithm is greedy ID3, 1986 A ← select the “best” attribute for the next node For each value of A create a branch and a descendant node (working leaf) Partition training examples to leaf nodes according to the attribute value of the branch If all training examples in a given leaf are perfectly classified (the same value of target attribute = class) or there are no attributes left, stop and create the leaf node indicating the respective class Otherwise, iterate over new successor leaf nodes (recursively construct a sub-tree for each partition using ID3) ID3 Artificial Intelligence Introduction to Artificial Intelligence
ID3 – Example (1) Start from an empty tree Let us assume ”income” is the “best” attribute for next node ( Why? See later… ) for each value of ”income” I AI S iPh? income create new branch 1 M Y M Y and descendant low medium high node 2 M Y F Y {4 (N),5 (N), {3 ( Y )} {1 ( Y ), 2 ( Y ), 3 H Y F Y 6 (N)} 7 (N), 8 (N)} 4 L Y M N perfect classification : imperfect classification 5 L Y F N create the leaf nodes and other attributes still available: 6 L N F N indicating the respective iterate over successive leaf node 7 M N M N classes income 8 M N F N low medium high construct a sub-tree {4 (N),5 (N), for this partition Y N {3 ( Y )} {1 ( Y ), 2 ( Y ), 6 (N)} 7 (N), 8 (N)} Artificial Intelligence Introduction to Artificial Intelligence
ID3 – Example (2) We don’t have an empty tree anymore. Expand what needs to be expanded. {1 ( Y ), 2 ( Y ), 7 (N), 8 (N)} Let us assume ”AI student” is the “best” decision attribute for the next node income income I AI S iPh? low medium high 1 M Y M Y 2 M Y F Y Y AI student N 7 M N M N yes no {4 (N),5 (N), {3 ( Y )} 8 M N F N 6 (N)} Y N All training examples for each value of ”AI student” {1 ( Y ), 2 ( Y )} {7 (N), 8 (N)} are perfectly classified create new branch and descendant node perfect classification : create the leave nodes indicating the respective classes HOW TO SELECT THE BEST DECISION ATTRIBUTE FOR A GIVEN NODE? Artificial Intelligence Introduction to Artificial Intelligence
Recommend
More recommend