Decision Trees Learn from labeled observations - supervised - PowerPoint PPT Presentation

Decision Trees • Learn from labeled observations - supervised learning • Represent the knowledge learned in form of a tree Example: learning when to play tennis. ● Examples/observations are days with their observed characteristics and whether we played tennis or not

Play Tennis Example Outlook Temperature Humidity Windy PlayTennis Sunny Hot High False No Sunny Hot High True No Overcast Hot High False Yes Rainy Mild High False Yes Rainy Cool Normal False Yes Rainy Cool Normal True No Overcast Cool Normal True Yes Sunny Mild High False No Sunny Cool Normal False Yes Rainy Mild Normal False Yes Sunny Mild Normal True Yes Overcast Mild High True Yes Overcast Hot Normal False Yes Rainy Mild High True No

Decision Tree Learning Induction Facts or Observations Theory

Interpreting a DT DT ≡ Decision Tree ➔ A DT uses the features of an observation table as nodes and the feature values as links. ➔ All feature values of a particular feature need to be represented as links. ➔ The target feature is special - its values show up as leaf nodes in the DT.

Interpreting a DT Each path from the root of the DT to a leaf can be interpreted as a decision rule. IF Outlook = Sunny AND Humidity = Normal THEN Playtennis = Yes IF Outlook = Overcast THEN Playtennis = Yes IF Outlook = Rain AND Wind = Strong THEN Playtennis = No

DT: Explanation & Prediction Explanation: the DT summarizes (explains) all the observations in the table perfectly ⇒ 100% Accuracy Prediction: once we have a DT (or model) we can use it to make predictions on observations that are not in the original training table, consider: Outlook = Sunny, Temperature = Mild, Humidity = Normal, Windy = False, Playtennis = ?

Constructing DTs • How do we choose the attributes and the order in which they appear in a DT? ● Recursive partitioning of the original data table ● Heuristic - each generated partition has to be “less random” (entropy reduction) than previously generated partitions

Entropy S is a sample of training examples � p + is the proportion of positive examples in S � p - is the proportion of negative examples in S � Entropy measures the impurity (randomness) of S � S p + � Entropy ( S ) ≡ - p + log 2 p + - p - log 2 p - Entropy ( S ) = Entropy ([ 9+,5- ]) = . 94

Partitioning the Data Set Outlook Temperature Humidity Windy PlayTennis Sunny Hot High False No Sunny Hot High True No E = .97 Sunny Mild High False No Sunny Cool Normal False Yes Sunny Mild Normal True Yes Sunny Outlook Temperature Humidity Windy PlayTennis Overcast Hot High False Yes Overcast Outlook Average Overcast Cool Normal True Yes E = 0 Entropy = .64 Overcast Mild High True Yes Overcast Hot Normal False Yes (weighted .69) Rain y Outlook Temperature Humidity Windy PlayTennis Rainy Mild High False Yes Rainy Cool Normal False Yes E = .97 Rainy Cool Normal True No Rainy Mild Normal False Yes Rainy Mild High True No

Partitioning in Action E = E = .640 .789 E = E = .911 .892

Recursive Partitioning Based on material from the book: "Machine Learning", Tom M. Mitchell. McGraw-Hill, 1997.

Recursive Partitioning Our data set: Outlook Temperature Humidity Windy PlayTennis Sunny Hot High False No Sunny Hot High True No Overcast Hot High False Yes Rainy Mild High False Yes Rainy Cool Normal False Yes Rainy Cool Normal True No Overcast Cool Normal True Yes Sunny Mild High False No Sunny Cool Normal False Yes Rainy Mild Normal False Yes Sunny Mild Normal True Yes Overcast Mild High True Yes Overcast Hot Normal False Yes Rainy Mild High True No

Sunny Hot High False No Recursive Partitioning Sunny Hot High True No Overcast Hot High False Yes Rainy Mild High False Yes Rainy Cool Normal False Yes Rainy Cool Normal True No Overcast Cool Normal True Yes Sunny Mild High False No Sunny Cool Normal False Yes Rainy Mild Normal False Yes Sunny Mild Normal True Yes Overcast Mild High True Yes Overcast Hot Normal False Yes Rainy Mild High True No Outlook Rainy Mild High False Yes Sunny Hot High False No Rainy Cool Normal False Yes Sunny Hot High True No Rainy Cool Normal True No Sunny Mild High False No Rainy Mild Normal False Yes Sunny Cool Normal False Yes Rainy Mild High True No Sunny Mild Normal True Yes Overcast Hot High False Yes Overcast Cool Normal True Yes Overcast Mild High True Yes Overcast Hot Normal False Yes

Recursive Partitioning Outlook Rainy Mild High False Yes Sunny Hot High False No Rainy Cool Normal False Yes Sunny Hot High True No Rainy Cool Normal True No Sunny Mild High False No Rainy Mild Normal False Yes Sunny Cool Normal False Yes Rainy Mild High True No Sunny Mild Normal True Yes Overcast Hot High False Yes Overcast Cool Normal True Yes Overcast Mild High True Yes Overcast Hot Normal False Yes

Recursive Partitioning Outlook Rainy Mild High False Yes Sunny Hot High False No Rainy Cool Normal False Yes Sunny Hot High True No Rainy Cool Normal True No Sunny Mild High False No Rainy Mild Normal False Yes Sunny Cool Normal False Yes Rainy Mild High True No Sunny Mild Normal True Yes Overcast Hot High False Yes Overcast Cool Normal True Yes Humidity Overcast Mild High True Yes Overcast Hot Normal False Yes Sunny Cool Normal False Yes Sunny Mild Normal True Yes Sunny Hot High False No Sunny Hot High True No Sunny Mild High False No

Recursive Partitioning Outlook Rainy Mild High False Yes Sunny Hot High False No Rainy Cool Normal False Yes Sunny Hot High True No Rainy Cool Normal True No Sunny Mild High False No Rainy Mild Normal False Yes Sunny Cool Normal False Yes Rainy Mild High True No Sunny Mild Normal True Yes Overcast Hot High False Yes Overcast Cool Normal True Yes Humidity Windy Overcast Mild High True Yes Overcast Hot Normal False Yes Sunny Cool Normal False Yes Sunny Mild Normal True Yes Sunny Hot High False No Rainy Mild High False Yes Sunny Hot High True No Rainy Cool Normal False Yes Sunny Mild High False No Rainy Mild Normal False Yes Rainy Cool Normal True No Rainy Mild High True No

Continuous-Valued Attributes ● Sort instances Consider: according to the attribute values ● Find “Splits” where the classes (48+60)/2 (80+90)/2 change = 54 = 85 ● Select the split that gives you the highest gain Highest Gain: Temperature > 54

Decision Trees & Patterns in Data

Overfitting – Also True for Trees! Test Error high Training Error r o r r E high Model Complexity Tree Depth!

Tree Learning Process Control the Tree Complexity - Pruning

Pruning ● One of two ways: 1. Prevent the tree from overfitting – limit the tree depth. 2. Build the whole tree and then remove subtrees and replaces with suitable leaves.

Pruning Example

Subtree Pruning with Deviation ● At each split ask: ○ Is the pattern found in the data after splitting statistically significant? ● Prune if deviation is small – that is, prune if no significant information gain.

Given Split

Absence of Pattern

Deviation è Delete split if Dev is small

Decision Trees Learn from labeled observations - supervised - PowerPoint PPT Presentation

Decision Trees Learn from labeled observations - supervised learning Represent the knowledge learned in form of a tree Example: learning when to play tennis. Examples/observations are days with their observed characteristics and

Decision Trees Lecture 23 To left or to right 1 Decision Trees 2 Decision Trees A different

Decision Trees Lecture 22 To left or to right 1 Decision Trees 2 Decision Trees A different

Lecture 23: Decision Trees Decision trees Prof. Julia Hockenmaier

Decision Tree R Greiner Cmput 466 / 551 Learning Decision Trees Def'n: Decision Trees

1 Baysian networks for decision analysis The TV show problem as a BN Problems More than

1 Baysian networks for decision analysis The TV show problem as a BN Problems Choice 1

Learning Decision Trees Representation is a decision tree. Bias is towards simple decision

Decision Tree Decision Trees A decision tree is a decision support tool that uses a tree-like

Tree Computation for Ranking and Classification CS240A, T. Yang, 2016 Outlines Decision Trees

15-388/688 - Practical Data Science: Decision trees and interpretable models J. Zico Kolter

Machine Learning III: Beyond Decision Trees Extensions to Decision Trees AI Class 15 (Ch.

Medical Decision Making Learning: Decision Trees Artificial Intelligence CSPP 56553 February

Decision Trees 2-26-16 Reading Quiz Decision trees are an algorithm for which machine learning

Machine Learning and Data Mining Decision Trees Kalev Kask Decision trees Functional form

Wrap Up! Lecture 25 Decision Trees & Branching Programs Many Topics Not Covered! Decision

IAML: Decision Trees Chris Williams and Victor Lavrenko School of Informatics Semester 1 1 / 17

CSC 411 Lecture 3: Decision Trees Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla

Optimal Sparse Decision Trees Xiyang Hu Cynthia Rudin Margo Seltzer Carnegie Mellon Duke

Decision Trees Thomas Schwarz, SJ Decision Trees One of many machine learning methods

Decision trees PRISM - Nicolas Sutton-Charani 20/01/2020 N. Sutton-Charani Artificial

Introduction to Machine Learning CMU-10701 23. Decision Trees Barnabs Pczos Contents

Learning Decision Trees Machine Learning 1 Some slides from Tom Mitchell, Dan Roth and others

Decision Trees and Nave Bayes 3/29/17 Hypothesis Spaces Decision Trees and K-Nearest

Implementation of Decision Trees using R Margaret Mir-Juli, Arnau Mir and Monica J.

Decision Trees Learn from labeled observations - supervised - PowerPoint PPT Presentation

Decision Trees Learn from labeled observations - supervised learning Represent the knowledge learned in form of a tree Example: learning when to play tennis. Examples/observations are days with their observed characteristics and

Decision Trees Lecture 23 To left or to right 1 Decision Trees 2 Decision Trees A different

Decision Trees Lecture 22 To left or to right 1 Decision Trees 2 Decision Trees A different

Lecture 23: Decision Trees Decision trees Prof. Julia Hockenmaier

Decision Tree R Greiner Cmput 466 / 551 Learning Decision Trees Def'n: Decision Trees

1 Baysian networks for decision analysis The TV show problem as a BN Problems More than

1 Baysian networks for decision analysis The TV show problem as a BN Problems Choice 1

Learning Decision Trees Representation is a decision tree. Bias is towards simple decision

Decision Tree Decision Trees A decision tree is a decision support tool that uses a tree-like

Tree Computation for Ranking and Classification CS240A, T. Yang, 2016 Outlines Decision Trees

15-388/688 - Practical Data Science: Decision trees and interpretable models J. Zico Kolter

Machine Learning III: Beyond Decision Trees Extensions to Decision Trees AI Class 15 (Ch.

Medical Decision Making Learning: Decision Trees Artificial Intelligence CSPP 56553 February

Decision Trees 2-26-16 Reading Quiz Decision trees are an algorithm for which machine learning

Machine Learning and Data Mining Decision Trees Kalev Kask Decision trees Functional form

Wrap Up! Lecture 25 Decision Trees &amp; Branching Programs Many Topics Not Covered! Decision

IAML: Decision Trees Chris Williams and Victor Lavrenko School of Informatics Semester 1 1 / 17

CSC 411 Lecture 3: Decision Trees Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla

Optimal Sparse Decision Trees Xiyang Hu Cynthia Rudin Margo Seltzer Carnegie Mellon Duke

Decision Trees Thomas Schwarz, SJ Decision Trees One of many machine learning methods

Decision trees PRISM - Nicolas Sutton-Charani 20/01/2020 N. Sutton-Charani Artificial

Introduction to Machine Learning CMU-10701 23. Decision Trees Barnabs Pczos Contents

Learning Decision Trees Machine Learning 1 Some slides from Tom Mitchell, Dan Roth and others

Decision Trees and Nave Bayes 3/29/17 Hypothesis Spaces Decision Trees and K-Nearest

Implementation of Decision Trees using R Margaret Mir-Juli, Arnau Mir and Monica J.

Wrap Up! Lecture 25 Decision Trees & Branching Programs Many Topics Not Covered! Decision