Decision Trees I Dr. Alex Williams August 24, 2020 COSC 425: Introduction to Machine Learning Fall 2020 (CRN: 44874) COSC 425: Intro. to Machine Learning 1
COSC 425: Intro. to Machine Learning 2
Today’s Agenda We will address: 1. What are decision trees? 2. What functions can we learn with decision trees? COSC 425: Intro. to Machine Learning 3
1. What are Decision Trees? COSC 425: Intro. to Machine Learning 4
Types of Machine Learning Supervised Unsupervised Reinforcement Learning Learning Learning COSC 425: Intro. to Machine Learning 5
Decision Trees: Example Data Suppose you have data about students’ Input Variables Output Variables preferences for courses at UTK. (Features) (Targets) student_id course_type course_location difficulty grade rating s1 ML online easy 80 … like Example / s1 Compilers face-to-face easy 87 … like Instance s2 Compilers face-to-face hard 72 … dislike s3 OS online hard 79 … dislike s3 Algorithms online hard 85 … dislike s4 ML online hard 66 … like … Dataset (i.e. with Input-Output Pairs) COSC 425: Intro. to Machine Learning 6
Goal: Predict whether a student will like a course. Testing Data Input-output Pairs ( x i , y i ) x Training Data f Learning Algorithm Input-output Pairs f(x) y ( x i , y i ) COSC 425: Intro. to Machine Learning 7
Decision Trees: Questions Goal: Predict whether a student will like a course. Prospective x 1 Course You: Is the course a Compilers course? Me: Yes. f Learned You: Is the course online? 2 Function Me: Yes. You: Were past online courses difficult? Predicted f(x) 3 Like Me: Yes. You : I predict the student will not like this course. COSC 425: Intro. to Machine Learning 8
Decision Trees: Questions Goal: Predict whether a student will like a course. Prospective x 1 Course You: Is the course a Compilers course? Me: Yes. f Learned You: Is the course online? 2 Prediction is about finding Function Me: Yes. questions that matter. You: Has the student liked most online courses? Predicted f(x) 3 Like Me: No. You : I predict the student will not like this course. COSC 425: Intro. to Machine Learning 9
Decision Trees: Questions isCompilers Prospective x 1 Course no yes isOnline Dislike f Learned 2 no yes Function isEasy isMorning? Predicted f(x) 3 Like no yes no yes Dislike Like Dislike Like COSC 425: Intro. to Machine Learning 10
From Questions to Learning Terminology for Decision Trees instance = a set of feature values <“Compilers”, “online”, “easy”, 80, …, “like” question = conditionals constructed based on features isOnline? isEasy? grade > 80? isTaughtByDrWillliams? question answer = determined by feature values yes/no categorical (e.g. “online”, “face-to-face”, “hybrid”) label / target class “rating” COSC 425: Intro. to Machine Learning 11
From Questions to Learning Learning is concerned with finding the “best” tree for the data. We could enumerate all possible trees and evaluate each tree. Answer: Too many! … Okay. So, how many trees is that? <“Compilers”, “online”, “easy”, 80, …, “like”> à Finding Optimal Tree is NP-Hard (See Hyafil and Rivest 1976.) Thus: We greedily ask “If I could ask one question, what is it?” Alternative framing: “What is the one question that would be most helpful in estimating whether a student will enjoy a particular course?” COSC 425: Intro. to Machine Learning 12
From Questions to Learning Decision Trees Split Your Data Each node represents a question that splits your data. • Decision tree learning = choosing what internal nodes should be. • Questions are Conditionals Grade > 80 • Grade in [80-90] • Location is {“online”, “hybrid”, “face-to-face”} • Teacher is DR_WILLIAMS • MLgrade * 2 + COMPILERgrade * 3 • COSC 425: Intro. to Machine Learning 13
From Questions to Learning morning Compilers online easy Distribution of Like/Dislike labels for each question. COSC 425: Intro. to Machine Learning 14
From Questions to Learning Compilers easy Uninformative Informative COSC 425: Intro. to Machine Learning 15
2. What Functions Can We Learn? COSC 425: Intro. to Machine Learning 16
Supervised Learning: Theory Problem Setting: • Set of possible instances: • Unknown target function: • Set of function hypotheses The Learning Algorithm: • Input : training examples • Output : Hypothesis that best approximates the target function (Daumé, pg. 9) The set of all hypotheses that can be ”spat out” by a learning algorithm is called the hypothesis space . COSC 425: Intro. to Machine Learning 17
Supervised Learning: Theory Problem Setting: • Set of possible instances: Each instance is a feature vector. • Unknown target function: y = 1 if a student likes the course; otherwise, y = 0 • Set of function hypotheses The Learning Algorithm: Each hypothesis is a decision tree! • Input : training examples • Output : Hypothesis that best approximates the target function COSC 425: Intro. to Machine Learning 18
Trees as Functions: Boolean Logic Translate the Tree to Boolean Logic Example : Weather Prediction COSC 425: Intro. to Machine Learning 19
Example: Cancer Recurrence Prediction Input Variables Output Variables (Features) (Targets) radius texture perimeter … outcome Example / 18.02 27.6 117.5 N Instance 17.99 10.38 122.8 N 20.29 14.34 135.1 R … … … … N = No Recurrence; R = Recurrence Output Variables : COSC 425: Intro. to Machine Learning 20
Example: Cancer Recurrence Prediction What does a node present? A partitioning of the input space. Internal Nodes : A test or question. Discrete features : Branch on all values • Real Features : Branch on threshold value • Leaf Nodes : Include instances that satisfy the tests along the branch. Remember the Following: Each instance maps to a particular leaf. • Each leaf typically contains more than one example. • COSC 425: Intro. to Machine Learning 21
Example: Cancer Recurrence Prediction Input Variables Output Variables (Features) (Targets) What does a node present? A partitioning of the input space. radius texture perimeter … outcome Example / 18.02 27.6 117.5 N Internal Nodes : A test or question. Instance Discrete features : Branch on all values • 17.99 10.38 122.8 N Continuous Features : Branch on threshold value • 20.29 14.34 135.1 R Leaf Nodes : Include instances that satisfy … … … … the tests along the branch. Remember the Following: N = No Recurrence; R = Recurrence Output Variables : Each instance maps to a particular leaf. • Each leaf typically contains more than one example. • COSC 425: Intro. to Machine Learning 22
Example: Cancer Recurrence Prediction Conversion: Decision trees translate to sets of if-then rules. COSC 425: Intro. to Machine Learning 23
Example: Cancer Recurrence Prediction Conversion: Decision trees can represent probability of recurrence. COSC 425: Intro. to Machine Learning 24
Decision Trees: Interpretation Important: Decision trees form boundaries in your data. COSC 425: Intro. to Machine Learning 25
Decision Trees: Interpretation Predict a person’s interest in skiing or snowboarding. 17 20 + H > 148 Skier - Snowboarder yes no 200 + + + + Height (H) W > 20 + Ski + + yes no - 150 148 + - - - - + - W < 17 - SB 125 - - yes - no 100 H > 125 Ski yes no 10 30 1 20 Width (W) SB Ski COSC 425: Intro. to Machine Learning 26
Decision Trees: Interpretation See Ishwaran H. and Rao J.S. (2009) COSC 425: Intro. to Machine Learning 27
Hypothesis Space For decision trees, the hypothesis space is the set of all possible finite discrete functions that can be learned based on the data. à f(x) = { category1, category2, …, category N } Every finite discrete function can be represented by some decision tree. (… Hence the need to be greedy!) COSC 425: Intro. to Machine Learning 28
Hypothesis Space Note: Encoding Challenges • Some functions demand exponentially large decision trees to represent. Boolean functions can be fully expressed in decision trees. • Each entry in a truth table can be one path. (Inefficient!) • Most Boolean functions can be encoded more compactly. COSC 425: Intro. to Machine Learning 29
Decision Boundaries Note: Encoding Challenges • Some functions demand exponentially large decision trees to represent. Boolean functions can be fully expressed in decision trees. • Each entry in a truth table can be one path. (Inefficient!) • Most Boolean functions can be encoded more compactly. COSC 425: Intro. to Machine Learning 30
Decision Boundaries for Real-Valued Features Use Real-Valued Features with “Nice” Bounds. • Best used when labels occupy “axis-orthogonal” regions of input space. COSC 425: Intro. to Machine Learning 31
Today’s Agenda We have addressed: 1. What are decision trees? 2. What functions can we learn with decision trees? COSC 425: Intro. to Machine Learning 32
Reading Daume. Chapter 1 COSC 425: Intro. to Machine Learning 33
Recommend
More recommend