Example: Age, Income and Owning a flat 250 Training set - PowerPoint PPT Presentation

Decision ¡Tree ¡Learning ¡ Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 25, 2014

Example: ¡Age, ¡Income ¡and ¡Owning ¡a ¡flat ¡ 250 ¡ Training ¡set ¡ (thousand ¡rupees) ¡ Monthly ¡income ¡ Owns ¡a ¡ • 200 ¡ house ¡ 150 ¡ L 1 Does ¡ • 100 ¡ not ¡own ¡ a ¡house ¡ 50 ¡ L 2 0 ¡ 0 ¡ 10 ¡ 20 ¡ 30 ¡ 40 ¡ 50 ¡ 60 ¡ 70 ¡ Age ¡ § If the training data was as above – Could we define some simple rules by observation? § Any point above the line L 1 à Owns a house § Any point to the right of L 2 à Owns a house § Any other point à Does not own a house 2 ¡

Example: ¡Age, ¡Income ¡and ¡Owning ¡a ¡flat ¡ 250 ¡ Training ¡set ¡ (thousand ¡rupees) ¡ Monthly ¡income ¡ Owns ¡a ¡ • 200 ¡ house ¡ 150 ¡ L 1 Does ¡ • 100 ¡ not ¡own ¡ a ¡house ¡ 50 ¡ L 2 0 ¡ a t a d 0 ¡ 10 ¡ 20 ¡ 30 ¡ 40 ¡ 50 ¡ 60 ¡ 70 ¡ e h t e , v l o a r b e a n Age ¡ s e a g h n c I u s e b t ’ n o w Income ≥ 101 : Label = Yes Root node : Split at Age ≥ 54 : Label = Yes Income = 101 Income < 101 : Split at Age = 54 Age < 54 : Label = No 3 ¡

Example: ¡Age, ¡Income ¡and ¡Owning ¡a ¡flat ¡ 250 ¡ Training ¡set ¡ (thousand ¡rupees) ¡ Monthly ¡income ¡ Owns ¡a ¡ • 200 ¡ house ¡ 150 ¡ Does ¡ • 100 ¡ not ¡own ¡ a ¡house ¡ 50 ¡ 0 ¡ 0 ¡ 10 ¡ 20 ¡ 30 ¡ 40 ¡ 50 ¡ 60 ¡ 70 ¡ Age ¡ § Approach: recursively split the data into partitions so that each partition becomes purer till … How to decide How to measure When to stop? the split? purity? 4 ¡

Approach ¡for ¡spliKng ¡ § What are the possible lines for splitting? – For each v ariable , midpoints between pairs of consecutive values for the variable – How many? – If N = number of points in training set and m = number of variables – About O( N × m ) § How to choose which line to use for splitting? – The line which reduce impurity (~ heterogeneity of composition) the most § How to measure impurity ? 5 ¡

Gini ¡Index ¡for ¡Measuring ¡Impurity ¡ § Suppose there are C classes § Let p ( i | t ) = fraction of observations belonging to class i in rectangle (node) t § Gini index: C ∑ p ( i | t ) 2 Gini ( t ) = 1 − i = 1 § If all observations in t belong to one single class Gini ( t ) = 0 § When is Gini ( t ) maximum? 6 ¡

Entropy ¡ § Average amount of information contained § From another point of view – average amount of information expected – hence amount of uncertainty – We will study this in more detail later § Entropy: C ∑ Entropy ( t ) = − p ( i | t ) × log 2 p ( i | t ) i = 1 Where 0 log 2 0 is defined to be 0 7 ¡

ClassificaOon ¡Error ¡ § What if we stop the tree building at a node – That is, do not create any further branches for that node – Make that node a leaf – Classify the node with the most frequent class present in the node This rectangle (node) is still impure § Classification error as measure of impurity ClassificationError ( t ) = 1 − max i [ p ( i | t )] § Intuitively – the impurity of the most frequent class in the rectangle (node) 8 ¡

The ¡Full ¡Blown ¡Tree ¡ § Recursive splitting Root ¡ Number ¡ 1000 ¡ § Suppose we don’t stop until of ¡points ¡ all nodes are pure § A large decision tree with 400 ¡ 600 ¡ leaf nodes having very few data points – Does not represent classes well 200 ¡ 200 ¡ 160 ¡ 240 ¡ – Overfitting § Solution: – Stop earlier, or – Prune back the tree StaOsOcally ¡not ¡ 2 ¡ 1 ¡ 5 ¡ significant ¡ 9 ¡

Prune ¡back ¡ § Pruning step: collapse leaf Decision ¡ nodes and make the immediate node ¡ (Freq ¡= ¡7) ¡ parent a leaf node § Effect of pruning – Lose purity of nodes Leaf ¡node ¡ Leaf ¡node ¡ – But were they really pure or was (label ¡= ¡Y) ¡ (label ¡= ¡B) ¡ that a noise? Freq ¡= ¡5 ¡ Freq ¡= ¡2 ¡ – Too many nodes ≈ noise Prune ¡ § Trade-off between loss of purity and gain in complexity Leaf ¡node ¡ (label ¡= ¡Y) ¡ Freq ¡= ¡7 ¡ 10 ¡

Prune ¡back: ¡cost ¡complexity ¡ § Cost complexity of a (sub)tree: Decision ¡ § Classification error (based on node ¡ (Freq ¡= ¡7) ¡ training data) and a penalty for size of the tree Leaf ¡node ¡ Leaf ¡node ¡ tradeoff ( T ) = Err ( T ) + α L ( T ) (label ¡= ¡Y) ¡ (label ¡= ¡B) ¡ Freq ¡= ¡5 ¡ Freq ¡= ¡2 ¡ § Err ( T ) is the classification error § L ( T ) = number of leaves in T Prune ¡ § Penalty factor α is between 0 and 1 Leaf ¡node ¡ – If α = 0, no penalty for bigger tree (label ¡= ¡Y) ¡ Freq ¡= ¡7 ¡ 11 ¡

Different ¡Decision ¡Tree ¡Algorithms ¡ § Chi-square Automatic Interaction Detector (CHAID) – Gordon Kass (1980) – Stop subtree creation if not statistically significant by chi-square test § Classification and Regression Trees (CART) – Breiman et al. – Decision tree building by Gini’s index § Iterative Dichotomizer 3 (ID3) – Ross Quinlan (1986) – Splitting by information gain (difference in entropy) § C4.5 – Quinlan’s next algorithm, improved over ID3 – Bottom up pruning, both categorical and continuous variables – Handling of incomplete data points § C5.0 – Ross Quinlan’s commercial version 12 ¡

ProperOes ¡of ¡Decision ¡Trees ¡ ¡ § Non parametric approach – Does not require any prior assumptions regarding the probability distribution of the class and attributes § Finding an optimal decision tree is an NP-complete problem – Heuristics used: greedy, recursive partitioning, top-down, bottom-up pruning § Fast to generate, fast to classify § Easy to interpret or visualize § Error propagation – An error at the top of the tree propagates all the way down 13 ¡

References ¡ § Introduction to Data Mining, by Tan, Steinbach, Kumar – Chapter 4 is available online: http://www-users.cs.umn.edu/~kumar/dmbook/ch4.pdf 14 ¡

Example: Age, Income and Owning a flat 250 Training set - PowerPoint PPT Presentation

Decision Tree Learning Debapriyo Majumdar Data Mining Fall 2014 Indian Statistical Institute Kolkata August 25, 2014 Example: Age, Income and Owning a flat 250 Training set (thousand

On(x) ~Flat(x) START FINISH ~Flat(Spare) Intact(Spare) Off(Spare) On(Tire1) Flat(Tire1)

On(x) ~Flat(x) START FINISH ~Flat(Spare) Intact(Spare) Off(Spare) On(Tire1) Flat(Tire1)

IS OWNING A CAT BETTER Elie J. Diner, PhD SlideTalk THAN OWNING A DOG? University of Cat

Interlocking Forme No. 1 Flat - 520x418mm Finished - 220x307 Interlocking Forme No. 2 Flat -

Straight line drawing of a graph on the flat torus Luca Castelli Aleardi, LIX Olivier

Flat Rail Offering a solution for your OOG Cargo Simply attach to the 40 flat rack and

Very Flat, Locally Very Flat, and Contraadjusted Modules Alexander Sl avik (joint work with

Positional Plagiocephaly Flat Head Syndrome Positional Plagiocephaly Also known as flat

45% 30% 15% 0% on Taxable Income +medicare levies NET INCOME Examples of Income: -

The owning house data Can we separate the points with a line? 200 Income

The Firefighter Problem on Trees David Ellison RMIT School of Science Co-authors: Pierre

Toy Example Toy Example Toy Example Toy Example Toy Example D 1 weak classifiers = vertical or

Toy Example Toy Example Toy Example Toy Example Toy Example D 1 weak classifiers = vertical or

INCOME Tourism INCOME Tourism INCOME Tourism INCOME Tourism A collaborative learning approach

Household Income Outline Main Income indicators from HIES 2016 15+ population Monthly

Individual Income Tax 2 KRS 141.010, enacted October 1, 1942 Flat 5% rate on taxable

Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 6 of Data Mining by

Linking and Building Ontologies of Linked Data Rahul Parundekar, Craig A. Knoblock and Jose-Luis

Summary of Search Strategies Strategy Frontier Selection Halts? Space Depth-first Last node

Minimal Cost Complexity Pruning of Meta-Classifiers Andreas L. Prodromidis Salvatore J. Stolfo

PRUNING NESTED-DFS FOR PARAMETRIC TIMED AUTOMATA LAURE PETRUCCI & JACO VAN DE POL CNRS/LIPN,

Simulation-Based Admissible Dominance Pruning Alvaro Torralba, J org Hoffmann HSDIP

MID-TERM FOLLOW-UP Semantic mining: Unsupervised acquisition of multilingual semantic classes

High-Performance Hardware for Machine Learning U.C. Berkeley October 19, 2016 William Dally

Example: Age, Income and Owning a flat 250 Training set - PowerPoint PPT Presentation

Decision Tree Learning Debapriyo Majumdar Data Mining Fall 2014 Indian Statistical Institute Kolkata August 25, 2014 Example: Age, Income and Owning a flat 250 Training set (thousand

On(x) ~Flat(x) START FINISH ~Flat(Spare) Intact(Spare) Off(Spare) On(Tire1) Flat(Tire1)

On(x) ~Flat(x) START FINISH ~Flat(Spare) Intact(Spare) Off(Spare) On(Tire1) Flat(Tire1)

IS OWNING A CAT BETTER Elie J. Diner, PhD SlideTalk THAN OWNING A DOG? University of Cat

Interlocking Forme No. 1 Flat - 520x418mm Finished - 220x307 Interlocking Forme No. 2 Flat -

Straight line drawing of a graph on the flat torus Luca Castelli Aleardi, LIX Olivier

Flat Rail Offering a solution for your OOG Cargo Simply attach to the 40 flat rack and

Very Flat, Locally Very Flat, and Contraadjusted Modules Alexander Sl avik (joint work with

Positional Plagiocephaly Flat Head Syndrome Positional Plagiocephaly Also known as flat

45% 30% 15% 0% on Taxable Income +medicare levies NET INCOME Examples of Income: -

The owning house data Can we separate the points with a line? 200 Income

The Firefighter Problem on Trees David Ellison RMIT School of Science Co-authors: Pierre

Toy Example Toy Example Toy Example Toy Example Toy Example D 1 weak classifiers = vertical or

Toy Example Toy Example Toy Example Toy Example Toy Example D 1 weak classifiers = vertical or

INCOME Tourism INCOME Tourism INCOME Tourism INCOME Tourism A collaborative learning approach

Household Income Outline Main Income indicators from HIES 2016 15+ population Monthly

Individual Income Tax 2 KRS 141.010, enacted October 1, 1942 Flat 5% rate on taxable

Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 6 of Data Mining by

Linking and Building Ontologies of Linked Data Rahul Parundekar, Craig A. Knoblock and Jose-Luis

Summary of Search Strategies Strategy Frontier Selection Halts? Space Depth-first Last node

Minimal Cost Complexity Pruning of Meta-Classifiers Andreas L. Prodromidis Salvatore J. Stolfo

PRUNING NESTED-DFS FOR PARAMETRIC TIMED AUTOMATA LAURE PETRUCCI &amp; JACO VAN DE POL CNRS/LIPN,

Simulation-Based Admissible Dominance Pruning Alvaro Torralba, J org Hoffmann HSDIP

MID-TERM FOLLOW-UP Semantic mining: Unsupervised acquisition of multilingual semantic classes

High-Performance Hardware for Machine Learning U.C. Berkeley October 19, 2016 William Dally

PRUNING NESTED-DFS FOR PARAMETRIC TIMED AUTOMATA LAURE PETRUCCI & JACO VAN DE POL CNRS/LIPN,