supervised learning via decision trees
play

Supervised Learning via Decision Trees Lecture 4 Supervised - PowerPoint PPT Presentation

Wentworth Institute of Technology COMP4050 Machine Learning | Fall 2015 | Derbinsky Supervised Learning via Decision Trees Lecture 4 Supervised Learning via Decision Trees October 13, 2015 1 Wentworth Institute of Technology COMP4050


  1. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Supervised Learning via Decision Trees Lecture 4 Supervised Learning via Decision Trees October 13, 2015 1

  2. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Outline 1. Learning via feature splits 2. ID3 – Information gain 3. Extensions – Continuous features – Gain ratio – Ensemble learning Supervised Learning via Decision Trees October 13, 2015 2

  3. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Decision Trees • Sequence of decisions at choice nodes from root to a leaf node – Each choice node splits on a single feature • Can be used for classification or regression • Explicit, easy for humans to understand • Typically very fast at testing/ prediction time h"ps://en.wikipedia.org/wiki/Decision_tree_learning Supervised Learning via Decision Trees October 13, 2015 3

  4. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Weather Example Supervised Learning via Decision Trees October 13, 2015 4

  5. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky IRIS Example Supervised Learning via Decision Trees October 13, 2015 5

  6. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Training Issues • Approximation – Optimal tree-building is NP-complete – Typically greedy, top-down • Bias vs. Variance – Occam’s Razor vs. CC/SSN • Pruning, ensemble methods • Splitting metric – Information gain , gain ratio , Gini impurity Supervised Learning via Decision Trees October 13, 2015 6

  7. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky I terative D ichotomiser 3 • Invented by Ross Quinlan in 1986 – Precursor to C4.5/5 • Categorical data only • Greedily consumes features – Subtrees cannot consider previous feature(s) for further splits – Typically produces shallow trees Supervised Learning via Decision Trees October 13, 2015 7

  8. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky ID3: Algorithm Sketch • If all examples “same”, return f (examples) • If no more features, return f (examples) • A = “best” feature – For each distinct value of A • branch = ID3( attributes - {A} ) Supervised Learning via Decision Trees October 13, 2015 8

  9. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Details Classification Regression • “same” = same class • “same” = std. dev. < ε • f (examples) = majority • f (examples) = average Supervised Learning via Decision Trees October 13, 2015 9

  10. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Recursion • A method of programming in which a function refers to itself in order to solve a problem – Example: ID3 calls itself for subtrees • Never necessary – In some situations, results in simpler and/or easier-to-write code – Can often be more expensive in terms of memory + time Supervised Learning via Decision Trees October 13, 2015 10

  11. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Example Consider the factorial function n Y n ! = k = 1 ∗ 2 ∗ 3 ∗ . . . ∗ n k =1 Supervised Learning via Decision Trees October 13, 2015 11

  12. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Iterative Implementation def factorial(n): result = 1 for i in range(n): result *= (i+1) return result Supervised Learning via Decision Trees October 13, 2015 12

  13. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Consider a Recursive Definition Base Case 0! = 1 when n ≥ 1 n ! = n ( n − 1)! Recursive Step Supervised Learning via Decision Trees October 13, 2015 13

  14. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Recursive Implementation def factorial_r(n): if n == 0: return 1 else: return (n * factorial_r(n-1)) Supervised Learning via Decision Trees October 13, 2015 14

  15. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky How the Code Executes factorial_r return 1 factorial_r return 1 * factorial_r( 0 ) factorial_r return 2 * factorial_r( 1 ) Func%on Stack factorial_r return 3 * factorial_r( 2 ) factorial_r return 4 * factorial_r( 3 ) Stack main Frame print factorial_r( 4 ) Supervised Learning via Decision Trees October 13, 2015 15

  16. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky How the Code Executes factorial_r return 1 * 1 factorial_r return 2 * factorial_r( 1 ) Func%on Stack factorial_r return 3 * factorial_r( 2 ) factorial_r return 4 * factorial_r( 3 ) Stack main Frame print factorial_r( 4 ) Supervised Learning via Decision Trees October 13, 2015 16

  17. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky How the Code Executes factorial_r return 2 * 1 Func%on Stack factorial_r return 3 * factorial_r( 2 ) factorial_r return 4 * factorial_r( 3 ) Stack main Frame print factorial_r( 4 ) Supervised Learning via Decision Trees October 13, 2015 17

  18. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky How the Code Executes Func%on Stack factorial_r return 3 * 2 factorial_r return 4 * factorial_r( 3 ) Stack main Frame print factorial_r( 4 ) Supervised Learning via Decision Trees October 13, 2015 18

  19. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky How the Code Executes Func%on Stack factorial_r return 4 * 6 Stack main Frame print factorial_r( 4 ) Supervised Learning via Decision Trees October 13, 2015 19

  20. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky How the Code Executes Func%on Stack Stack main Frame print 24 Supervised Learning via Decision Trees October 13, 2015 20

  21. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky ID3: Algorithm Sketch • If all examples “same”, return f (examples) • If no more features, return f (examples) • A = “best” feature – For each distinct value of A Base Cases • branch = ID3( attributes - {A} ) Recursive Step Supervised Learning via Decision Trees October 13, 2015 21

  22. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Splitting Metric: The “best” Feature Classification Regression • Information gain • Standard Deviation Reduction – Goal: choose splits that proceed from much->little h"p://www.saedsayad.com/ uncertainty decision_tree_reg.htm Supervised Learning via Decision Trees October 13, 2015 22

  23. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Shannon Entropy • Measure of “impurity” or uncertainty • Intuition: the less likely the event, the more information is transmitted Supervised Learning via Decision Trees October 13, 2015 23

  24. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Entropy Range Small Large Supervised Learning via Decision Trees October 13, 2015 24

  25. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Quantifying Entropy H ( X ) = E [ I ( X )] Expected value of informaCon X Z P ( x i ) I ( x i ) P ( x ) I ( x ) dx i Supervised Learning via Decision Trees October 13, 2015 25

  26. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Intuition for Information I ( X ) = . . . I ( X ) ≥ 0 • Shouldn’t be negative I (1) = 0 • Events that always occur communicate no information • Information from independent I ( X 1 , X 2 ) = events are additive I ( X 1 ) + I ( X 2 ) Supervised Learning via Decision Trees October 13, 2015 26

  27. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Quantifying Information 1 I ( X ) = log b P ( X ) = − log b P ( X ) Log Base = Units: 2=bit ( bi nary digi t ), 3=trit, e=nat X H ( X ) = − P ( x i ) log b P ( x i ) i Log Base = Units: 2=shannon/bit Supervised Learning via Decision Trees October 13, 2015 27

  28. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Example: Fair Coin Toss I(heads) = log 2 ( 1 0 . 5) = log 2 2 = 1 bit I(tails) = log 2 ( 1 0 . 5) = log 2 2 = 1 bit H(fair toss) = (0 . 5)(1) + (0 . 5)(1) = = 1 shannon Supervised Learning via Decision Trees October 13, 2015 28

  29. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Example: Double Headed Coin H (double head) = (1) · I (head) = (1) · log 2 (1 1) = (1) · (0) = 0 shannons Supervised Learning via Decision Trees October 13, 2015 29

  30. Wentworth Institute of Technology COMP4050 – Machine Learning | Fall 2015 | Derbinsky Exercise: Weighted Coin Compute the entropy of a coin that will land on heads about 25% of the time, and tails the remaining 75%. Supervised Learning via Decision Trees October 13, 2015 30

Recommend


More recommend