machine learning for signal
play

Machine Learning for Signal Processing Detecting faces (& other - PowerPoint PPT Presentation

Machine Learning for Signal Processing Detecting faces (& other objects) in images Class 7. 22 Sep 2015 11755/18979 1 Last Lecture: How to describe a face The typical face A typical face that captures the essence of


  1. Boosting • The basic idea: Can a “weak” learning algorithm that performs just slightly better than a random guess be boosted into an arbitrarily accurate “strong” learner • This is a “meta” algorithm, that poses no constraints on the form of the weak learners themselves 11755/18979 40

  2. Boosting: A Voting Perspective • Boosting is a form of voting – Let a number of different classifiers classify the data – Go with the majority – Intuition says that as the number of classifiers increases, the dependability of the majority vote increases • Boosting by majority • Boosting by weighted majority – A (weighted) majority vote taken over all the classifiers – How do we compute weights for the classifiers? – How do we actually train the classifiers 11755/18979 41

  3. ADA Boost • Challenge: how to optimize the classifiers and their weights? – Trivial solution: Train all classifiers independently – Optimal: Each classifier focuses on what others missed – But joint optimization becomes impossible • Ada ptive Boost ing: Greedy incremental optimization of classifiers – Keep adding classifiers incrementally, to fix what others missed 11755/18979 42

  4. AdaBoost ILLUSTRATIVE EXAMPLE 11755/18979 43

  5. AdaBoost First WEAK Learner 11755/18979 44

  6. AdaBoost The First Weak Learner makes Errors 11755/18979 45

  7. AdaBoost Reweighted data 11755/18979 46

  8. AdaBoost SECOND Weak Learner FOCUSES ON DATA “MISSED” BY FIRST LEARNER 11755/18979 47

  9. AdaBoost SECOND STRONG Learner Combines both Weak Learners 11755/18979 48

  10. AdaBoost RETURNING TO THE SECOND WEAK LEARNER 11755/18979 49

  11. AdaBoost The SECOND Weak Learner makes Errors 11755/18979 50

  12. AdaBoost Reweighting data 11755/18979 51

  13. THIRD Weak AdaBoost Learner FOCUSES ON DATA “MISSED” BY FIRST AND SECOND LEARNERs 11755/18979 52

  14. AdaBoost THIRD STRONG Learner 11755/18979 53

  15. Boosting: An Example • Red dots represent training data from Red class • Blue dots represent training data from Blue class 11755/18979 54

  16. Boosting: An Example • The final strong learner has learnt a complicated decision boundary 11755/18979 55

  17. Boosting: An Example • The final strong learner h as learnt a complicated decision boundary • Decision boundaries in areas with low density of training points assumed inconsequential 11755/18979 56

  18. Overall Learning Pattern  Strong learner increasingly accurate with increasing number of weak learners  Residual errors increasingly difficult to correct ‒ Additional weak learners less and less effective Error of n th weak learner Error of n th strong learner number r of weak k learn rners rs 11755/18979 57

  19. Overfitting  Note: Can continue to add weak learners EVEN after strong learner error goes to 0!  Shown to IMPROVE generalization! Error of n th weak learner This may go to 0 Error of n th strong learner number r of weak k learn rners rs 11755/18979 58

  20. AdaBoost: Summary • No relation to Ada Lovelace • Adaptive Boosting • Adaptively Selects Weak Learners • ~8K citations for just one paper for Freund and Schapire 11755/18979 59

  21. The ADABoost Algorithm • Initialize D 1 ( x i ) = 1/ N • For t = 1, …, T – Train a weak classifier h t using distribution D t – Compute total error on training data • e t = Sum { D t (x i ) ½(1 – y i h t ( x i ))} – Set  t = ½ ln ((1 – e t ) / e t ) – For i = 1… N • set D t +1 ( x i ) = D t ( x i ) exp(-  t y i h t ( x i )) – Normalize D t +1 to make it a distribution • The final classifier is – H ( x ) = sign( S t  t h t ( x )) 11755/18979 60

  22. First, some example data = 0.3 E1 - 0.6 E2 = 0.2 E1 + 0.4 E2 = 0.5 E1 - 0.5 E2 = -0.8 E1 - 0.1 E2 E 1 = 0.7 E1 - 0.1 E2 = 0.4 E1 - 0.9 E2 = 0.6 E1 - 0.4 E2 = 0.2 E1 + 0.5 E2 E 2 Image = a*E1 + b*E2  a = Image.E1 • Face detection with multiple Eigen faces • Step 0: Derived top 2 Eigen faces from Eigen face training data • Step 1: On a (different) set of examples, express each image as a linear combination of Eigen faces – Examples include both faces and non faces – Even the non-face images are explained in terms of the Eigen faces 11755/18979 61

  23. Training Data D A = 0.3 E1 - 0.6 E2 = 0.2 E1 + 0.4 E2 E B = 0.5 E1 - 0.5 E2 = -0.8 E1 - 0.1 E2 F C = 0.7 E1 - 0.1 E2 = 0.4 E1 - 0.9 E2 G D = 0.6 E1 - 0.4 E2 = 0.2 E1 + 0.5 E2 ID E1 E2. Class A 0.3 -0.6 +1 Face = +1 B 0.5 -0.5 +1 Non-face = -1 C 0.7 -0.1 +1 D 0.6 -0.4 +1 E 0.2 0.4 -1 F -0.8 -0.1 -1 G 0.4 -0.9 -1 H 0.2 0.5 -1 11755/18979 62

  24. The ADABoost Algorithm • Initialize D 1 ( x i ) = 1/ N • For t = 1, …, T – Train a weak classifier h t using distribution D t – Compute total error on training data • e t = Sum { D t (x i ) ½(1 – y i h t ( x i ))} – Set  t = ½ ln ((1 – e t ) / e t ) – For i = 1… N • set D t +1 ( x i ) = D t ( x i ) exp(-  t y i h t ( x i )) – Normalize D t +1 to make it a distribution • The final classifier is – H ( x ) = sign( S t  t h t ( x )) 11755/18979 63

  25. Initialize D 1 ( x i ) = 1/ N 11755/18979 64

  26. Training Data = 0.3 E1 - 0.6 E2 = 0.2 E1 + 0.4 E2 = 0.5 E1 - 0.5 E2 = -0.8 E1 - 0.1 E2 = 0.7 E1 - 0.1 E2 = 0.4 E1 - 0.9 E2 = 0.6 E1 - 0.4 E2 = 0.2 E1 + 0.5 E2 ID E1 E2. Class Weight A 0.3 -0.6 +1 1/8 B 0.5 -0.5 +1 1/8 C 0.7 -0.1 +1 1/8 D 0.6 -0.4 +1 1/8 E 0.2 0.4 -1 1/8 F -0.8 -0.1 -1 1/8 G 0.4 -0.9 -1 1/8 H 0.2 0.5 -1 1/8 11755/18979 65

  27. The ADABoost Algorithm • Initialize D 1 ( x i ) = 1/ N • For t = 1, …, T – Train a weak classifier h t using distribution D t – Compute total error on training data • e t = Sum { D t (x i ) ½(1 – y i h t ( x i ))} – Set  t = ½ ln ( e t /(1 – e t )) – For i = 1… N • set D t +1 ( x i ) = D t ( x i ) exp(-  t y i h t ( x i )) – Normalize D t +1 to make it a distribution • The final classifier is – H ( x ) = sign( S t  t h t ( x )) 11755/18979 66

  28. The E1 “Stump” Classifier based on E1: F E H A G B C D if ( sign*wt(E1) > thresh) > 0) face = true -0.8 0.2 0.2 0.3 0.4 0.5 0.6 0.7 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 sign = +1 or -1 threshold Sign = +1, error = 3/8 Sign = -1, error = 5/8 ID E1 E2. Class Weight A 0.3 -0.6 +1 1/8 B 0.5 -0.5 +1 1/8 C 0.7 -0.1 +1 1/8 D 0.6 -0.4 +1 1/8 E 0.2 0.4 -1 1/8 F -0.8 -0.1 -1 1/8 G 0.4 -0.9 -1 1/8 H 0.2 0.5 -1 1/8 11755/18979 67

  29. The E1 “Stump” Classifier based on E1: F E H A G B C D if ( sign*wt(E1) > thresh) > 0) face = true -0.8 0.2 0.2 0.3 0.4 0.5 0.6 0.7 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 sign = +1 or -1 threshold Sign = +1, error = 3/8 Sign = -1, error = 5/8 ID E1 E2. Class Weight A 0.3 -0.6 +1 1/8 B 0.5 -0.5 +1 1/8 C 0.7 -0.1 +1 1/8 D 0.6 -0.4 +1 1/8 E 0.2 0.4 -1 1/8 F -0.8 -0.1 -1 1/8 G 0.4 -0.9 -1 1/8 H 0.2 0.5 -1 1/8 11755/18979 68

  30. The E1 “Stump” Classifier based on E1: F E H A G B C D if ( sign*wt(E1) > thresh) > 0) face = true -0.8 0.2 0.2 0.3 0.4 0.5 0.6 0.7 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 sign = +1 or -1 threshold Sign = +1, error = 3/8 Sign = -1, error = 5/8 ID E1 E2. Class Weight A 0.3 -0.6 +1 1/8 B 0.5 -0.5 +1 1/8 C 0.7 -0.1 +1 1/8 D 0.6 -0.4 +1 1/8 E 0.2 0.4 -1 1/8 F -0.8 -0.1 -1 1/8 G 0.4 -0.9 -1 1/8 H 0.2 0.5 -1 1/8 11755/18979 69

  31. The E1 “Stump” Classifier based on E1: F E H A G B C D if ( sign*wt(E1) > thresh) > 0) face = true -0.8 0.2 0.2 0.3 0.4 0.5 0.6 0.7 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 sign = +1 or -1 threshold Sign = +1, error = 3/8 Sign = -1, error = 5/8 ID E1 E2. Class Weight A 0.3 -0.6 +1 1/8 B 0.5 -0.5 +1 1/8 C 0.7 -0.1 +1 1/8 D 0.6 -0.4 +1 1/8 E 0.2 0.4 -1 1/8 F -0.8 -0.1 -1 1/8 G 0.4 -0.9 -1 1/8 H 0.2 0.5 -1 1/8 11755/18979 70

  32. The E1 “Stump” Classifier based on E1: if ( sign*wt(E1) > thresh) > 0) F E H A G B C D face = true -0.8 0.2 0.2 0.3 0.4 0.5 0.6 0.7 sign = +1 or -1 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 threshold Sign = +1, error = 3/8 Sign = -1, error = 5/8 ID E1 E2. Class Weight A 0.3 -0.6 +1 1/8 B 0.5 -0.5 +1 1/8 C 0.7 -0.1 +1 1/8 D 0.6 -0.4 +1 1/8 E 0.2 0.4 -1 1/8 F -0.8 -0.1 -1 1/8 G 0.4 -0.9 -1 1/8 H 0.2 0.5 -1 1/8 11755/18979 71

  33. The E1 “Stump” Classifier based on E1: if ( sign*wt(E1) > thresh) > 0) F E H A G B C D face = true -0.8 0.2 0.2 0.3 0.4 0.5 0.6 0.7 sign = +1 or -1 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 threshold Sign = +1, error = 2/8 Sign = -1, error = 6/8 ID E1 E2. Class Weight A 0.3 -0.6 +1 1/8 B 0.5 -0.5 +1 1/8 C 0.7 -0.1 +1 1/8 D 0.6 -0.4 +1 1/8 E 0.2 0.4 -1 1/8 F -0.8 -0.1 -1 1/8 G 0.4 -0.9 -1 1/8 H 0.2 0.5 -1 1/8 11755/18979 72

  34. The E1 “Stump” Classifier based on E1: if ( sign*wt(E1) > thresh) > 0) F E H A G B C D face = true -0.8 0.2 0.2 0.3 0.4 0.5 0.6 0.7 sign = +1 or -1 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 threshold Sign = +1, error = 1/8 Sign = -1, error = 7/8 ID E1 E2. Class Weight A 0.3 -0.6 +1 1/8 B 0.5 -0.5 +1 1/8 C 0.7 -0.1 +1 1/8 D 0.6 -0.4 +1 1/8 E 0.2 0.4 -1 1/8 F -0.8 -0.1 -1 1/8 G 0.4 -0.9 -1 1/8 H 0.2 0.5 -1 1/8 11755/18979 73

  35. The E1 “Stump” Classifier based on E1: if ( sign*wt(E1) > thresh) > 0) F E H A G B C D face = true -0.8 0.2 0.2 0.3 0.4 0.5 0.6 0.7 sign = +1 or -1 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 threshold Sign = +1, error = 2/8 Sign = -1, error = 6/8 ID E1 E2. Class Weight A 0.3 -0.6 +1 1/8 B 0.5 -0.5 +1 1/8 C 0.7 -0.1 +1 1/8 D 0.6 -0.4 +1 1/8 E 0.2 0.4 -1 1/8 F -0.8 -0.1 -1 1/8 G 0.4 -0.9 -1 1/8 H 0.2 0.5 -1 1/8 11755/18979 74

  36. The E1 “Stump” Classifier based on E1: if ( sign*wt(E1) > thresh) > 0) F E H A G B C D face = true -0.8 0.2 0.2 0.3 0.4 0.5 0.6 0.7 sign = +1 or -1 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 threshold Sign = +1, error = 1/8 Sign = -1, error = 7/8 ID E1 E2. Class Weight A 0.3 -0.6 +1 1/8 B 0.5 -0.5 +1 1/8 C 0.7 -0.1 +1 1/8 D 0.6 -0.4 +1 1/8 E 0.2 0.4 -1 1/8 F -0.8 -0.1 -1 1/8 G 0.4 -0.9 -1 1/8 H 0.2 0.5 -1 1/8 11755/18979 75

  37. The E1 “Stump” Classifier based on E1: if ( sign*wt(E1) > thresh) > 0) F E H A G B C D face = true -0.8 0.2 0.2 0.3 0.4 0.5 0.6 0.7 sign = +1 or -1 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 threshold Sign = +1, error = 2/8 Sign = -1, error = 6/8 ID E1 E2. Class Weight A 0.3 -0.6 +1 1/8 B 0.5 -0.5 +1 1/8 C 0.7 -0.1 +1 1/8 D 0.6 -0.4 +1 1/8 E 0.2 0.4 -1 1/8 F -0.8 -0.1 -1 1/8 G 0.4 -0.9 -1 1/8 H 0.2 0.5 -1 1/8 11755/18979 76

  38. The Best E1 “Stump” F E H A G B C D Classifier based on E1: if ( sign*wt(E1) > thresh) > 0) -0.8 0.2 0.2 0.3 0.4 0.5 0.6 0.7 face = true 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 Sign = +1 threshold Threshold = 0.45 Sign = +1, error = 1/8 ID E1 E2. Class Weight A 0.3 -0.6 +1 1/8 B 0.5 -0.5 +1 1/8 C 0.7 -0.1 +1 1/8 D 0.6 -0.4 +1 1/8 E 0.2 0.4 -1 1/8 F -0.8 -0.1 -1 1/8 G 0.4 -0.9 -1 1/8 H 0.2 0.5 -1 1/8 11755/18979 77

  39. The E2“Stump” Classifier based on E2: if ( sign*wt(E2) > thresh) > 0) Note order G A B D C F E H face = true -0.9 -0.6 -0.5 -0.4 -0.1 -0.1 0.4 0.5 sign = +1 or -1 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 threshold Sign = +1, error = 3/8 Sign = -1, error = 5/8 ID E1 E2. Class Weight A 0.3 -0.6 +1 1/8 B 0.5 -0.5 +1 1/8 C 0.7 -0.1 +1 1/8 D 0.6 -0.4 +1 1/8 E 0.2 0.4 -1 1/8 F -0.8 -0.1 -1 1/8 G 0.4 -0.9 -1 1/8 H 0.2 0.5 -1 1/8 11755/18979 78

  40. The Best E2“Stump” Classifier based on E2: if ( sign*wt(E2) > thresh) > 0) G A B D C F E H face = true -0.9 -0.6 -0.5 -0.4 -0.1 -0.1 0.4 0.5 sign = -1 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 Threshold = 0.15 threshold Sign = -1, error = 2/8 ID E1 E2. Class Weight A 0.3 -0.6 +1 1/8 B 0.5 -0.5 +1 1/8 C 0.7 -0.1 +1 1/8 D 0.6 -0.4 +1 1/8 E 0.2 0.4 -1 1/8 F -0.8 -0.1 -1 1/8 G 0.4 -0.9 -1 1/8 H 0.2 0.5 -1 1/8 11755/18979 79

  41. The Best “Stump” F E H A G B C D The Best overall classifier based on a single feature is -0.8 0.2 0.2 0.3 0.4 0.5 0.6 0.7 based on E1 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 If (wt(E1) > 0.45)  Face threshold Sign = +1, error = 1/8 ID E1 E2. Class Weight A 0.3 -0.6 +1 1/8 B 0.5 -0.5 +1 1/8 C 0.7 -0.1 +1 1/8 D 0.6 -0.4 +1 1/8 E 0.2 0.4 -1 1/8 F -0.8 -0.1 -1 1/8 G 0.4 -0.9 -1 1/8 H 0.2 0.5 -1 1/8 11755/18979 80

  42. The Best “Stump” 11755/18979 81

  43. The ADABoost Algorithm • Initialize D 1 ( x i ) = 1/ N • For t = 1, …, T – Train a weak classifier h t using distribution D t – Compute total error on training data • e t = Sum { D t (x i ) ½(1 – y i h t ( x i ))} – Set  t = ½ ln ( e t /(1 – e t )) – For i = 1… N – • set D t +1 ( x i ) = D t ( x i ) exp(-  t y i h t ( x i )) – Normalize D t +1 to make it a distribution • The final classifier is – H ( x ) = sign( S t  t h t ( x )) 11755/18979 82

  44. The Best “Stump” 11755/18979 83

  45. The Best Error F E H A G B C D The Error of the classifier is the sum of the weights of -0.8 0.2 0.2 0.3 0.4 0.5 0.6 0.7 the misclassified instances 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 threshold Sign = +1, error = 1/8 ID E1 E2. Class Weight A 0.3 -0.6 +1 1/8 B 0.5 -0.5 +1 1/8 C 0.7 -0.1 +1 1/8 D 0.6 -0.4 +1 1/8 E 0.2 0.4 -1 1/8 F -0.8 -0.1 -1 1/8 G 0.4 -0.9 -1 1/8 H 0.2 0.5 -1 1/8 NOTE: THE ERROR IS THE SUM OF THE WEIGHTS OF MISCLASSIFIED INSTANCES 11755/18979 84

  46. The ADABoost Algorithm • Initialize D 1 ( x i ) = 1/ N • For t = 1, …, T – Train a weak classifier h t using distribution D t – Compute total error on training data • e t = Sum { D t (x i ) ½(1 – y i h t ( x i ))} – Set  t = ½ ln ((1 – e t ) / e t ) – For i = 1… N • set D t +1 ( x i ) = D t ( x i ) exp(-  t y i h t ( x i )) – Normalize D t +1 to make it a distribution • The final classifier is – H ( x ) = sign( S t  t h t ( x )) 11755/18979 85

  47. Computing Alpha F E H A G B C D Alpha = 0.5ln((1-1/8) / (1/8)) -0.8 0.2 0.2 0.3 0.4 0.5 0.6 0.7 = 0.5 ln(7) = 0.97 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 threshold Sign = +1, error = 1/8 11755/18979 86

  48. The Boosted Classifier Thus Far F E H A G B C D Alpha = 0.5ln((1-1/8) / (1/8)) -0.8 0.2 0.2 0.3 0.4 0.5 0.6 0.7 = 0.5 ln(7) = 0.97 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 threshold Sign = +1, error = 1/8 h1(X) = wt(E1) > 0.45 ? +1 : -1 H(X) = sign(0.97 * h1(X)) It’s the same as h1(x) 11755/18979 87

  49. The ADABoost Algorithm • Initialize D 1 ( x i ) = 1/ N • For t = 1, …, T – Train a weak classifier h t using distribution D t – Compute total error on training data • e t = Average {½ (1 – y i h t ( x i ))} – Set  t = ½ ln ((1 – e t ) / e t ) – For i = 1… N • set D t +1 ( x i ) = D t ( x i ) exp(-  t y i h t ( x i )) – Normalize D t +1 to make it a distribution • The final classifier is – H ( x ) = sign( S t  t h t ( x )) 11755/18979 88

  50. The Best Error F E H A G B C D D t +1 ( x i ) = D t ( x i ) exp(-  t y i h t ( x i )) -0.8 0.2 0.2 0.3 0.4 0.5 0.6 0.7 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 exp(  t ) = exp(0.97) = 2.63 exp(-  t ) = exp(-0.97) = 0.38 threshold ID E1 E2. Class Weight Weight A 0.3 -0.6 +1 1/8 * 2.63 0.33 B 0.5 -0.5 +1 1/8 * 0.38 0.05 C 0.7 -0.1 +1 1/8 * 0.38 0.05 D 0.6 -0.4 +1 1/8 * 0.38 0.05 E 0.2 0.4 -1 1/8 * 0.38 0.05 F -0.8 0.1 -1 1/8 * 0.38 0.05 G 0.4 -0.9 -1 1/8 * 0.38 0.05 H 0.2 0.5 -1 1/8 * 0.38 0.05 Multiply the correctly classified instances by 0.38 Multiply incorrectly classified instances by 2.63 11755/18979 89

  51. AdaBoost 11755/18979 90

  52. AdaBoost 11755/18979 91

  53. The ADABoost Algorithm • Initialize D 1 ( x i ) = 1/ N • For t = 1, …, T – Train a weak classifier h t using distribution D t – Compute total error on training data • e t = Average {½ (1 – y i h t ( x i ))} – Set  t = ½ ln ((1 – e t ) / e t ) – For i = 1… N • set D t +1 ( x i ) = D t ( x i ) exp(-  t y i h t ( x i )) – Normalize D t +1 to make it a distribution • The final classifier is – H ( x ) = sign( S t  t h t ( x )) 11755/18979 92

  54. The Best Error F E H A G B C D D’ = D / sum(D) -0.8 0.2 0.2 0.3 0.4 0.5 0.6 0.7 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 threshold ID E1 E2. Class Weight Weight Weight A 0.3 -0.6 +1 1/8 * 2.63 0.33 0.48 B 0.5 -0.5 +1 1/8 * 0.38 0.05 0.074 C 0.7 -0.1 +1 1/8 * 0.38 0.05 0.074 D 0.6 -0.4 +1 1/8 * 0.38 0.05 0.074 E 0.2 0.4 -1 1/8 * 0.38 0.05 0.074 F -0.8 0.1 -1 1/8 * 0.38 0.05 0.074 G 0.4 -0.9 -1 1/8 * 0.38 0.05 0.074 H 0.2 0.5 -1 1/8 * 0.38 0.05 0.074 Multiply the correctly classified instances by 0.38 Multiply incorrectly classified instances by 2.63 Normalize to sum to 1.0 11755/18979 93

  55. The Best Error F E H A G B C D D’ = D / sum(D) -0.8 0.2 0.2 0.3 0.4 0.5 0.6 0.7 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 threshold ID E1 E2. Class Weight A 0.3 -0.6 +1 0.48 B 0.5 -0.5 +1 0.074 C 0.7 -0.1 +1 0.074 D 0.6 -0.4 +1 0.074 E 0.2 0.4 -1 0.074 F -0.8 0.1 -1 0.074 G 0.4 -0.9 -1 0.074 H 0.2 0.5 -1 0.074 Multiply the correctly classified instances by 0.38 Multiply incorrectly classified instances by 2.63 Normalize to sum to 1.0 11755/18979 94

  56. The ADABoost Algorithm • Initialize D 1 ( x i ) = 1/ N • For t = 1, …, T – Train a weak classifier h t using distribution D t – Compute total error on training data • e t = Average {½ (1 – y i h t ( x i ))} – Set  t = ½ ln ( e t /(1 – e t )) – For i = 1… N • set D t +1 ( x i ) = D t ( x i ) exp(-  t y i h t ( x i )) – Normalize D t +1 to make it a distribution • The final classifier is – H ( x ) = sign( S t  t h t ( x )) 11755/18979 95

  57. E1 classifier Classifier based on E1: if ( sign*wt(E1) > thresh) > 0) F E H A G B C D face = true -0.8 0.2 0.2 0.3 0.4 0.5 0.6 0.7 sign = +1 or -1 .074 .074 .074 .48 .074 .074 .074 .074 threshold Sign = +1, error = 0.222 Sign = -1, error = 0.778 ID E1 E2. Class Weight A 0.3 -0.6 +1 0.48 B 0.5 -0.5 +1 0.074 C 0.7 -0.1 +1 0.074 D 0.6 -0.4 +1 0.074 E 0.2 0.4 -1 0.074 F -0.8 0.1 -1 0.074 G 0.4 -0.9 -1 0.074 H 0.2 0.5 -1 0.074 11755/18979 96

  58. E1 classifier Classifier based on E1: if ( sign*wt(E1) > thresh) > 0) F E H A G B C D face = true -0.8 0.2 0.2 0.3 0.4 0.5 0.6 0.7 sign = +1 or -1 .074 .074 .074 .48 .074 .074 .074 .074 threshold Sign = +1, error = 0.148 Sign = -1, error = 0.852 ID E1 E2. Class Weight A 0.3 -0.6 +1 0.48 B 0.5 -0.5 +1 0.074 C 0.7 -0.1 +1 0.074 D 0.6 -0.4 +1 0.074 E 0.2 0.4 -1 0.074 F -0.8 0.1 -1 0.074 G 0.4 -0.9 -1 0.074 H 0.2 0.5 -1 0.074 11755/18979 97

  59. The Best E1 classifier Classifier based on E1: if ( sign*wt(E1) > thresh) > 0) F E H A G B C D face = true -0.8 0.2 0.2 0.3 0.4 0.5 0.6 0.7 sign = +1 or -1 .074 .074 .074 .48 .074 .074 .074 .074 threshold Sign = +1, error = 0.074 ID E1 E2. Class Weight A 0.3 -0.6 +1 0.48 B 0.5 -0.5 +1 0.074 C 0.7 -0.1 +1 0.074 D 0.6 -0.4 +1 0.074 E 0.2 0.4 -1 0.074 F -0.8 0.1 -1 0.074 G 0.4 -0.9 -1 0.074 H 0.2 0.5 -1 0.074 11755/18979 98

  60. The Best E2 classifier Classifier based on E2: if ( sign*wt(E2) > thresh) > 0) G A B D C F E H face = true -0.9 -0.6 -0.5 -0.4 -0.1 -0.1 0.4 0.5 sign = +1 or -1 .074 .48 .074 .074 .074 .074 .074 .074 threshold Sign = -1, error = 0.148 ID E1 E2. Class Weight A 0.3 -0.6 +1 0.48 B 0.5 -0.5 +1 0.074 C 0.7 -0.1 +1 0.074 D 0.6 -0.4 +1 0.074 E 0.2 0.4 -1 0.074 F -0.8 0.1 -1 0.074 G 0.4 -0.9 -1 0.074 H 0.2 0.5 -1 0.074 11755/18979 99

  61. The Best Classifier Classifier based on E1: F E H A G B C D if (wt(E1) > 0.45) face = true -0.8 0.2 0.2 0.3 0.4 0.5 0.6 0.7 .074 .074 .074 .48 .074 .074 .074 .074 Alpha = 0.5ln((1-0.074) / 0.074) threshold = 1.26 Sign = +1, error = 0.074 ID E1 E2. Class Weight A 0.3 -0.6 +1 0.48 B 0.5 -0.5 +1 0.074 C 0.7 -0.1 +1 0.074 D 0.6 -0.4 +1 0.074 E 0.2 0.4 -1 0.074 F -0.8 0.1 -1 0.074 G 0.4 -0.9 -1 0.074 H 0.2 0.5 -1 0.074 11755/18979 100

Recommend


More recommend