decision trees i
play

Decision Trees I Dr. Alex Williams August 24, 2020 COSC 425: - PowerPoint PPT Presentation

Decision Trees I Dr. Alex Williams August 24, 2020 COSC 425: Introduction to Machine Learning Fall 2020 (CRN: 44874) COSC 425: Intro. to Machine Learning 1 COSC 425: Intro. to Machine Learning 2 Todays Agenda We will address: 1. What


  1. Decision Trees I Dr. Alex Williams August 24, 2020 COSC 425: Introduction to Machine Learning Fall 2020 (CRN: 44874) COSC 425: Intro. to Machine Learning 1

  2. COSC 425: Intro. to Machine Learning 2

  3. Today’s Agenda We will address: 1. What are decision trees? 2. What functions can we learn with decision trees? COSC 425: Intro. to Machine Learning 3

  4. 1. What are Decision Trees? COSC 425: Intro. to Machine Learning 4

  5. Types of Machine Learning Supervised Unsupervised Reinforcement Learning Learning Learning COSC 425: Intro. to Machine Learning 5

  6. Decision Trees: Example Data Suppose you have data about students’ Input Variables Output Variables preferences for courses at UTK. (Features) (Targets) student_id course_type course_location difficulty grade rating s1 ML online easy 80 … like Example / s1 Compilers face-to-face easy 87 … like Instance s2 Compilers face-to-face hard 72 … dislike s3 OS online hard 79 … dislike s3 Algorithms online hard 85 … dislike s4 ML online hard 66 … like … Dataset (i.e. with Input-Output Pairs) COSC 425: Intro. to Machine Learning 6

  7. Goal: Predict whether a student will like a course. Testing Data Input-output Pairs ( x i , y i ) x Training Data f Learning Algorithm Input-output Pairs f(x) y ( x i , y i ) COSC 425: Intro. to Machine Learning 7

  8. Decision Trees: Questions Goal: Predict whether a student will like a course. Prospective x 1 Course You: Is the course a Compilers course? Me: Yes. f Learned You: Is the course online? 2 Function Me: Yes. You: Were past online courses difficult? Predicted f(x) 3 Like Me: Yes. You : I predict the student will not like this course. COSC 425: Intro. to Machine Learning 8

  9. Decision Trees: Questions Goal: Predict whether a student will like a course. Prospective x 1 Course You: Is the course a Compilers course? Me: Yes. f Learned You: Is the course online? 2 Prediction is about finding Function Me: Yes. questions that matter. You: Has the student liked most online courses? Predicted f(x) 3 Like Me: No. You : I predict the student will not like this course. COSC 425: Intro. to Machine Learning 9

  10. Decision Trees: Questions isCompilers Prospective x 1 Course no yes isOnline Dislike f Learned 2 no yes Function isEasy isMorning? Predicted f(x) 3 Like no yes no yes Dislike Like Dislike Like COSC 425: Intro. to Machine Learning 10

  11. From Questions to Learning Terminology for Decision Trees instance = a set of feature values <“Compilers”, “online”, “easy”, 80, …, “like” question = conditionals constructed based on features isOnline? isEasy? grade > 80? isTaughtByDrWillliams? question answer = determined by feature values yes/no categorical (e.g. “online”, “face-to-face”, “hybrid”) label / target class “rating” COSC 425: Intro. to Machine Learning 11

  12. From Questions to Learning Learning is concerned with finding the “best” tree for the data. We could enumerate all possible trees and evaluate each tree. Answer: Too many! … Okay. So, how many trees is that? <“Compilers”, “online”, “easy”, 80, …, “like”> à Finding Optimal Tree is NP-Hard (See Hyafil and Rivest 1976.) Thus: We greedily ask “If I could ask one question, what is it?” Alternative framing: “What is the one question that would be most helpful in estimating whether a student will enjoy a particular course?” COSC 425: Intro. to Machine Learning 12

  13. From Questions to Learning Decision Trees Split Your Data Each node represents a question that splits your data. • Decision tree learning = choosing what internal nodes should be. • Questions are Conditionals Grade > 80 • Grade in [80-90] • Location is {“online”, “hybrid”, “face-to-face”} • Teacher is DR_WILLIAMS • MLgrade * 2 + COMPILERgrade * 3 • COSC 425: Intro. to Machine Learning 13

  14. From Questions to Learning morning Compilers online easy Distribution of Like/Dislike labels for each question. COSC 425: Intro. to Machine Learning 14

  15. From Questions to Learning Compilers easy Uninformative Informative COSC 425: Intro. to Machine Learning 15

  16. 2. What Functions Can We Learn? COSC 425: Intro. to Machine Learning 16

  17. Supervised Learning: Theory Problem Setting: • Set of possible instances: • Unknown target function: • Set of function hypotheses The Learning Algorithm: • Input : training examples • Output : Hypothesis that best approximates the target function (Daumé, pg. 9) The set of all hypotheses that can be ”spat out” by a learning algorithm is called the hypothesis space . COSC 425: Intro. to Machine Learning 17

  18. Supervised Learning: Theory Problem Setting: • Set of possible instances: Each instance is a feature vector. • Unknown target function: y = 1 if a student likes the course; otherwise, y = 0 • Set of function hypotheses The Learning Algorithm: Each hypothesis is a decision tree! • Input : training examples • Output : Hypothesis that best approximates the target function COSC 425: Intro. to Machine Learning 18

  19. Trees as Functions: Boolean Logic Translate the Tree to Boolean Logic Example : Weather Prediction COSC 425: Intro. to Machine Learning 19

  20. Example: Cancer Recurrence Prediction Input Variables Output Variables (Features) (Targets) radius texture perimeter … outcome Example / 18.02 27.6 117.5 N Instance 17.99 10.38 122.8 N 20.29 14.34 135.1 R … … … … N = No Recurrence; R = Recurrence Output Variables : COSC 425: Intro. to Machine Learning 20

  21. Example: Cancer Recurrence Prediction What does a node present? A partitioning of the input space. Internal Nodes : A test or question. Discrete features : Branch on all values • Real Features : Branch on threshold value • Leaf Nodes : Include instances that satisfy the tests along the branch. Remember the Following: Each instance maps to a particular leaf. • Each leaf typically contains more than one example. • COSC 425: Intro. to Machine Learning 21

  22. Example: Cancer Recurrence Prediction Input Variables Output Variables (Features) (Targets) What does a node present? A partitioning of the input space. radius texture perimeter … outcome Example / 18.02 27.6 117.5 N Internal Nodes : A test or question. Instance Discrete features : Branch on all values • 17.99 10.38 122.8 N Continuous Features : Branch on threshold value • 20.29 14.34 135.1 R Leaf Nodes : Include instances that satisfy … … … … the tests along the branch. Remember the Following: N = No Recurrence; R = Recurrence Output Variables : Each instance maps to a particular leaf. • Each leaf typically contains more than one example. • COSC 425: Intro. to Machine Learning 22

  23. Example: Cancer Recurrence Prediction Conversion: Decision trees translate to sets of if-then rules. COSC 425: Intro. to Machine Learning 23

  24. Example: Cancer Recurrence Prediction Conversion: Decision trees can represent probability of recurrence. COSC 425: Intro. to Machine Learning 24

  25. Decision Trees: Interpretation Important: Decision trees form boundaries in your data. COSC 425: Intro. to Machine Learning 25

  26. Decision Trees: Interpretation Predict a person’s interest in skiing or snowboarding. 17 20 + H > 148 Skier - Snowboarder yes no 200 + + + + Height (H) W > 20 + Ski + + yes no - 150 148 + - - - - + - W < 17 - SB 125 - - yes - no 100 H > 125 Ski yes no 10 30 1 20 Width (W) SB Ski COSC 425: Intro. to Machine Learning 26

  27. Decision Trees: Interpretation See Ishwaran H. and Rao J.S. (2009) COSC 425: Intro. to Machine Learning 27

  28. Hypothesis Space For decision trees, the hypothesis space is the set of all possible finite discrete functions that can be learned based on the data. à f(x) = { category1, category2, …, category N } Every finite discrete function can be represented by some decision tree. (… Hence the need to be greedy!) COSC 425: Intro. to Machine Learning 28

  29. Hypothesis Space Note: Encoding Challenges • Some functions demand exponentially large decision trees to represent. Boolean functions can be fully expressed in decision trees. • Each entry in a truth table can be one path. (Inefficient!) • Most Boolean functions can be encoded more compactly. COSC 425: Intro. to Machine Learning 29

  30. Decision Boundaries Note: Encoding Challenges • Some functions demand exponentially large decision trees to represent. Boolean functions can be fully expressed in decision trees. • Each entry in a truth table can be one path. (Inefficient!) • Most Boolean functions can be encoded more compactly. COSC 425: Intro. to Machine Learning 30

  31. Decision Boundaries for Real-Valued Features Use Real-Valued Features with “Nice” Bounds. • Best used when labels occupy “axis-orthogonal” regions of input space. COSC 425: Intro. to Machine Learning 31

  32. Today’s Agenda We have addressed: 1. What are decision trees? 2. What functions can we learn with decision trees? COSC 425: Intro. to Machine Learning 32

  33. Reading Daume. Chapter 1 COSC 425: Intro. to Machine Learning 33

Recommend


More recommend