data mining techniques
play

Data Mining Techniques: Statistical Decision Theory Nearest - PDF document

Classification and Prediction Overview Introduction Decision Trees Data Mining Techniques: Statistical Decision Theory Nearest Neighbor Classification and Prediction Bayesian Classification Artificial Neural Networks Mirek


  1. Classification and Prediction Overview • Introduction • Decision Trees Data Mining Techniques: • Statistical Decision Theory • Nearest Neighbor Classification and Prediction • Bayesian Classification • Artificial Neural Networks Mirek Riedewald • Support Vector Machines (SVMs) • Prediction Some slides based on presentations by • Accuracy and Error Measures Han/Kamber, Tan/Steinbach/Kumar, and Andrew Moore • Ensemble Methods 2 Classification vs. Prediction Induction: Model Construction • Assumption: after data preparation, have single data Classification set where each record has attributes X 1 ,…, X n , and Y. Algorithm Training • Goal: learn a function f:(X 1 ,…, X n )  Y, then use this Data function to predict y for a given input record (x 1 ,…, x n ). – Classification: Y is a discrete attribute, called the class label Model • Usually a categorical attribute with small domain NAME RANK YEARS TENURED – Prediction: Y is a continuous attribute (Function) Mike Assistant Prof 3 no • Called supervised learning, because true labels (Y- Mary Assistant Prof 7 yes values) are known for the initially provided data Bill Professor 2 yes IF rank = ‘professor’ • Typical applications: credit approval, target marketing, Jim Associate Prof 7 yes OR years > 6 medical diagnosis, fraud detection Dave Assistant Prof 6 no THEN tenured = ‘yes’ Anne Associate Prof 3 no 3 4 Deduction: Using the Model Classification and Prediction Overview • Introduction Model • Decision Trees (Function) • Statistical Decision Theory • Bayesian Classification • Artificial Neural Networks Test Unseen Data Data • Support Vector Machines (SVMs) • Nearest Neighbor (Jeff, Professor, 4) • Prediction NAME RANK YEARS TENURED • Accuracy and Error Measures Tenured? Tom Assistant Prof 2 no • Ensemble Methods Merlisa Associate Prof 7 no George Professor 5 yes Joseph Assistant Prof 7 yes 5 6 1

  2. Example of a Decision Tree Another Example of Decision Tree Single, Splitting Attributes MarSt Tid Refund Marital Taxable Married Divorced Cheat Tid Refund Marital Taxable Status Income Cheat Status Income NO Refund 1 Yes Single 125K No 1 Yes Single 125K No No Refund Yes 2 No Married 100K No 2 No Married 100K No Yes No 3 No Single 70K No 3 No Single 70K No NO TaxInc 4 Yes Married 120K No NO MarSt No < 80K > 80K 4 Yes Married 120K Yes 5 No Divorced 95K Married Single, Divorced 5 No Divorced 95K Yes NO YES 6 No Married 60K No 6 No Married 60K No TaxInc NO 7 Yes Divorced 220K No 7 Yes Divorced 220K No < 80K > 80K 8 No Single 85K Yes 8 No Single 85K Yes 9 No Married 75K No NO YES There could be more than one tree that 9 No Married 75K No 10 No Single 90K Yes fits the same data! 10 No Single 90K Yes 10 10 Model: Decision Tree Training Data 7 8 Apply Model to Test Data Apply Model to Test Data Test Data Test Data Start from the root of tree. Refund Marital Taxable Refund Marital Taxable Status Income Cheat Status Income Cheat No Married 80K ? No Married 80K ? Refund Refund 10 10 Yes No Yes No NO MarSt NO MarSt Married Married Single, Divorced Single, Divorced TaxInc TaxInc NO NO < 80K > 80K < 80K > 80K NO YES NO YES 9 10 Apply Model to Test Data Apply Model to Test Data Test Data Test Data Refund Marital Taxable Refund Marital Taxable Cheat Cheat Status Income Status Income No Married 80K ? No Married 80K ? Refund Refund 10 10 Yes No Yes No NO MarSt NO MarSt Married Married Single, Divorced Single, Divorced TaxInc NO TaxInc NO < 80K > 80K < 80K > 80K NO YES NO YES 11 12 2

  3. Apply Model to Test Data Apply Model to Test Data Test Data Test Data Refund Marital Taxable Refund Marital Taxable Cheat Cheat Status Income Status Income No Married 80K ? No Married 80K ? Refund Refund 10 10 Yes No Yes No NO MarSt NO MarSt Assign Cheat to “No” Married Married Single, Divorced Single, Divorced TaxInc TaxInc NO NO < 80K > 80K < 80K > 80K NO YES NO YES 13 14 Decision Tree Induction Decision Boundary 1 x 2 • Basic greedy algorithm 0.9 X 1 – Top-down, recursive divide-and-conquer < 0.43? 0.8 – At start, all the training records are at the root 0.7 Yes No – Training records partitioned recursively based on split attributes 0.6 X 2 – Split attributes selected based on a heuristic or statistical X 2 0.5 < 0.33? < 0.47? measure (e.g., information gain) 0.4 • Conditions for stopping partitioning Yes No Yes No 0.3 Refund – Pure node (all records belong Yes No 0.2 : 4 : 0 : 0 : 4 to same class) : 0 : 4 : 3 : 0 0.1 – No remaining attributes for NO MarSt 0 further partitioning Married Single, Divorced 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x 1 • Majority voting for classifying the leaf TaxInc NO – No cases left Decision boundary = border between two neighboring regions of different classes. < 80K > 80K For trees that split on a single attribute at a time, the decision boundary is parallel NO YES to the axes. 15 16 How to Specify Split Condition? Splitting Nominal Attributes • Depends on attribute types • Multi-way split: use as many partitions as distinct values. – Nominal CarType – Ordinal Family Luxury Sports – Numeric (continuous) • Binary split: divides values into two subsets; • Depends on number of ways to split need to find optimal partitioning. – 2-way split CarType CarType – Multi-way split {Sports, OR {Family, {Family} {Sports} Luxury} Luxury} 17 18 3

  4. Splitting Ordinal Attributes Splitting Continuous Attributes • Multi-way split: • Different options Size – Discretization to form an ordinal categorical Small Large attribute Medium • Static – discretize once at the beginning • Binary split: • Dynamic – ranges found by equal interval bucketing, Size Size OR {Small, {Medium, equal frequency bucketing (percentiles), or clustering. {Large} {Small} Medium} Large} – Binary Decision: (A < v) or (A  v) • What about this split? Size • Consider all possible splits, choose best one {Small, {Medium} Large} 19 20 Splitting Continuous Attributes How to Determine Best Split Before Splitting: 10 records of class 0, 10 records of class 1 Taxable Taxable Income Income? > 80K? Own Car Student Car? Type? ID? < 10K > 80K Family Luxury c 1 c 20 Yes No Yes No c 10 c 11 Sports [10K,25K) [25K,50K) [50K,80K) C0: 6 C0: 4 C0: 1 C0: 8 C0: 1 C0: 1 ... C0: 1 C0: 0 ... C0: 0 C1: 4 C1: 6 C1: 3 C1: 0 C1: 7 C1: 0 C1: 0 C1: 1 C1: 1 (i) Binary split (ii) Multi-way split Which test condition is the best? 21 22 Attribute Selection Measure: How to Determine Best Split Information Gain • Select attribute with highest information gain • Greedy approach: • p i = probability that an arbitrary record in D belongs to class – Nodes with homogeneous class distribution are C i , i =1,…,m • Expected information (entropy) needed to classify a record preferred in D: m  • Need a measure of node impurity:   Info( D ) p log ( p ) i 2 i  i 1 • Information needed after using attribute A to split D into v C0: 5 C0: 9 partitions D 1 ,…, D v : C1: 5 C1: 1 | D | v   j Info ( ) Info( ) D D A j | D | Non-homogeneous, Homogeneous,  1 j High degree of impurity Low degree of impurity • Information gained by splitting on attribute A:   Gain (D) Info (D) Info (D) A A 23 24 4

Recommend


More recommend