bbm406
play

BBM406 Fundamentals of Machine Learning Lecture 18: Decision - PowerPoint PPT Presentation

Photo byUnsplash user @technobulka BBM406 Fundamentals of Machine Learning Lecture 18: Decision Trees Aykut Erdem // Hacettepe University // Fall 2019 Today Decision Trees Tree construction Overfitting Pruning


  1. Photo byUnsplash user @technobulka BBM406 Fundamentals of 
 Machine Learning Lecture 18: Decision Trees Aykut Erdem // Hacettepe University // Fall 2019

  2. Today • Decision Trees • Tree construction • Overfitting • Pruning • Real-valued inputs 2

  3. Machine Learning in the ER Physician documentation Triage Information Specialist consults MD comments (Free text) (free text) 2 hrs 30 min T=0 Repeated vital signs Disposition (continuous values) Measured every 30 s Lab results slide by David Sontag (Continuous valued) 3

  4. Can we predict infection? Physician documentation Specialist consults Triage Information (Free text) MD comments (free text) Many crucial decisions about a patient’s care are Repeated vital signs made here! (continuous values) Measured every 30 s Lab results slide by David Sontag (Continuous valued) 4

  5. Can we predict infection • Previous automatic approaches based on simple criteria: - Temperature < 96.8 °F or > 100.4 °F - Heart rate > 90 beats/min - Respiratory rate > 20 breaths/min 
 • Too simplified... e.g., heart rate depends on age! slide by David Sontag 5

  6. Can we predict infection? • These are the attributes we have for each patient: - Temperature - Heart rate (HR) - Respiratory rate (RR) - Age - Acuity and pain level - Diastolic and systolic blood pressure (DBP , SBP) - Oxygen Saturation (SaO2) • We have these attributes + label (infection) for 200,000 patients! • Let’s learn to classify infection slide by David Sontag 6

  7. Predicting infection using decision trees slide by David Sontag 7

  8. Example: Image Classification assification example slide by Nando de Freitas 8 [Criminisi et al, 2011]

  9. Example: Mushrooms slide by Jerry Zhu http://www.usask.ca/biology/fungi/ 9

  10. Mushroom features 1. cap-shape: bell=b, conical=c, convex=x, flat=f, knobbed=k, sunken=s 2. cap-surface: fibrous=f, grooves=g, scaly=y, smooth=s 3. cap-color: brown=n, bu ff =b, cinnamon=c, gray=g, green=r, pink=p,purple=u, red=e, white=w, yellow=y 4. bruises?: bruises=t,no=f 5. odor: almond=a, anise=l, creosote=c, fishy=y, foul=f, musty=m, none=n, pungent=p, spicy=s 6. gill-attachment: attached=a, descending=d, free=f, notched=n 7. ... slide by Jerry Zhu 10

  11. Two mushrooms x 1 =x,s,n,t,p,f,c,n,k,e,e,s,s,w,w,p,w,o,p,k,s,u y 1 =p x 2 =x,s,y,t,a,f,c,b,k,e,c,s,s,w,w,p,w,o,p,n,n,g y 2 =e 1. cap-shape: bell=b,conical=c,convex=x,flat=f, knobbed=k,sunken=s 2. cap-surface: fibrous=f,grooves=g,scaly=y,smooth=s 3. cap-color: brown=n,buff=b,cinnamon=c,gray=g,green=r, pink=p,purple=u,red=e,white=w,yellow=y slide by Jerry Zhu 4. … 11

  12. Example: Automobile Miles-per- gallon prediction mpg cylinders displacement horsepower weight acceleration modelyear maker good 4 low low low high 75to78 asia bad 6 medium medium medium medium 70to74 america bad 4 medium medium medium low 75to78 europe bad 8 high high high low 70to74 america bad 6 medium medium medium medium 70to74 america bad 4 low medium low medium 70to74 asia bad 4 low medium low low 70to74 asia bad 8 high high high low 75to78 america : : : : : : : : : : : : : : : : : : : : : : : : bad 8 high high high low 70to74 america good 8 high medium high high 79to83 america bad 8 high high high low 75to78 america good 4 low low low low 79to83 america bad 6 medium medium medium high 75to78 america good 4 medium low low low 79to83 america good 4 low low medium high 79to83 america bad 8 high high high low 70to74 america good 4 low medium low medium 75to78 europe slide by Jerry Zhu bad 5 medium medium medium medium 75to78 europe 12

  13. Hypotheses: decision trees f : X → Y • Each internal node Cylinders& tests an attribute x i • Each branch 3 & 4 & 5 & 6 & 8 & assigns an good bad bad attribute value x i = v Maker& Horsepower& • Each leaf assigns a low & med & america & asia & europe & high & class y 
 bad good bad good good bad • To classify input x : traverse the tree from root to leaf, Human interpretable! output the labeled y slide by David Sontag 13

  14. Hypothesis space sis space • How many possible hypotheses? 
 • What functions can be represented? Cylinders& 6 & 3 & 4 & 5 & 8 & bad good bad Maker& Horsepower& slide by David Sontag america & low & med & high & asia & europe & bad good good good bad bad 14

  15. What functions can be represented? → • Decision trees can A t& A B A xor B F T represent any function of F F F B B F T T the input attributes! 
 F T F T T F T T T F F T T F • For Boolean functions, path (Figure&from&Stuart&Russell)& & to leaf gives truth table row 
 & Cylinders& • But, could require exponentially many 6 & 3 & 4 & 5 & 8 & & nodes… bad good bad Maker& Horsepower& america & low & med & high & asia & europe & bad good good good bad bad slide by David Sontag cyl=3 ∨ (cyl=4 ∧ (maker=asia ∨ maker=europe)) ∨ … 15

  16. 
 
 
 
 
 
 Are all decision trees equal? • Many trees can represent the same concept • But, not all trees will have the same size - e.g., φ =(A ∧ B) ∨ (¬A ∧ C) — ((A and B) or (not A and C)) 
 � � � � B A t f t f C C B C t f t f t f t f _ _ _ A A + + + t f t f • Which tree do we prefer? slide by David Sontag _ _ + + r? 16

  17. Learning decision trees is hard!!! • Learning the simplest (smallest) decision tree is an NP-complete problem [Hyafil & Rivest ’76] 
 • Resort to a greedy heuristic: - Start from empty decision tree - Split on next best attribute (feature) - Recurse slide by David Sontag 17

  18. A Decision Stump Internal node q�estion: ¡��hat ¡is ¡the ¡ number of c�linders�? Leaves: classify by slide by Jerry Zhu majority vote 18

  19. Key idea: Greedily learn trees using recursion Records in which cylinders = 4 Records in which cylinders = 5 Take the And partition it Original according Records Dataset.. to the value of in which the attribute we cylinders split on = 6 Records in which slide by David Sontag cylinders = 8 19

  20. Recursive Step Build tree from Build tree from Build tree from Build tree from These records.. These records.. These records.. These records.. Records in Records in which cylinders which cylinders = 8 slide by David Sontag = 6 Records in Records in which cylinders which cylinders = 5 = 4 20

  21. Second level of tree Recursively build a tree from the seven (Similar recursion in slide by David Sontag records in which there are four cylinders the other cases) and the maker was based in Asia 21

  22. 1. Do not split when all The full examples have the same label decision tree 2. Can not split when we run out of questions slide by Jerry Zhu 22

  23. 
 
 
 
 
 
 
 Splitting: Choosing a good attribute • Would we prefer to split on X 1 or ? X 1 X 2 Y X 2 ? 
 T T T T F T X 1 X 2 T T T t f t f T F T Y=t : 4 Y=t : 1 Y=t : 3 Y=t : 2 2 F T T Y=f : 0 Y=f : 3 Y=f : 1 Y=f : 2 F F F F T F Idea: use counts at leaves to F F F define probability distributions, slide by David Sontag so we can measure uncertainty! 23

  24. Measuring uncertainty • Good split if we are more certain about classification after split - Deterministic good (all true or all false) - Uniform distribution bad - What about distributions in between? P(Y=A) = 1/2 P(Y=B) = 1/4 P(Y=C) = 1/8 P(Y=D) = 1/8 P(Y=A) = 1/4 P(Y=B) = 1/4 P(Y=C) = 1/4 P(Y=D) = 1/4 slide by David Sontag 24

  25. Entropy • Entropy H ( Y ) of a random variable Y Entropy&of&a&coin&flip& Entropy&of&a&coin&flip& • More uncertainty, more entropy! 
 Entropy& • Information Theory interpretation: s H ( Y ) is the expected number of 
 bits needed to encode a randomly drawn value of Y (under most e ffi cient code) slide by David Sontag Probability&of&heads& 25

  26. High, Low Entropy • “High Entropy” - Y is from a uniform like distribution - Flat histogram - Values sampled from it are less predictable 
 • “Low Entropy” - Y is from a varied (peaks and valleys) distribution - Histogram has many lows and highs - Values sampled from it are more predictable slide by Vibhav Gogate 26

  27. Entropy Example Entropy&of&a&coin&flip& Entropy& Probability&of&heads& P(Y=t) = 5/6 X 1 X 2 Y T T T P(Y=f) = 1/6 T F T T T T H(Y) = - 5/6 log 2 5/6 - 1/6 log 2 1/6 T F T = 0.65 F T T slide by David Sontag F F F 27

  28. Conditional Entropy Condi>onal&Entropy& H( Y |X) &of&a&random&variable& Y &condi>oned&on&a& random&variable& X X 1 X 2 Y Example: X 1 T T T t f T F T P(X 1 =t) = 4/6 Y=t : 4 Y=t : 1 T T T P(X 1 =f) = 2/6 Y=f : 0 Y=f : 1 T F T F T T H(Y|X 1 ) = - 4/6 (1 log 2 1 + 0 log 2 0) F F F slide by David Sontag - 2/6 (1/2 log 2 1/2 + 1/2 log 2 1/2) = 2/6 28

Recommend


More recommend