learning from data lecture 23 svm s maximizing the margin
play

Learning From Data Lecture 23 SVMs: Maximizing the Margin A Better - PowerPoint PPT Presentation

Learning From Data Lecture 23 SVMs: Maximizing the Margin A Better Hyperplane Maximizing the Margin Link to Regularization M. Magdon-Ismail CSCI 4100/6100 recap: Linear Models, RBFs, Neural Networks Linear Model with Nonlinear Transform


  1. Learning From Data Lecture 23 SVM’s: Maximizing the Margin A Better Hyperplane Maximizing the Margin Link to Regularization M. Magdon-Ismail CSCI 4100/6100

  2. recap: Linear Models, RBFs, Neural Networks Linear Model with Nonlinear Transform Neural Network k -RBF-Network   � � � � ˜ d m k � � � h ( x ) = θ  w 0 + w j Φ j ( x ) h ( x ) = θ w 0 + w j θ ( v j t x ) h ( x ) = θ w 0 + w j φ ( | | x − µ j | | )  j =1 j =1 j =1 gradient descent k-means Neural Network: generalization of linear model by adding layers. Support Vector Machine: more ‘robust’ linear model M Maximizing the Margin : 2 /19 � A c L Creator: Malik Magdon-Ismail Which separator to pick? − →

  3. Which Separator Do You Pick? Being robust to noise (measurement error) is good (remember regularization). M Maximizing the Margin : 3 /19 � A c L Creator: Malik Magdon-Ismail Robustness to noise − →

  4. Robustness to Noisy Data Being robust to noise (measurement error) is good (remember regularization). M Maximizing the Margin : 4 /19 � A c L Creator: Malik Magdon-Ismail Thicker cushion means more robust − →

  5. Thicker Cushion Means More Robustness We call such hyperplanes fat M Maximizing the Margin : 5 /19 � A c L Creator: Malik Magdon-Ismail Two crucial questions − →

  6. Two Crucial Questions 1. Can we efficiently find the fattest separating hyperplane? 2. Is a fatter hyperplane better than a thin one? M Maximizing the Margin : 6 /19 � A c L Creator: Malik Magdon-Ismail Pulling out the bias − →

  7. Pulling Out the Bias Before Now x ∈ { 1 } × R d ; w ∈ R d +1 x ∈ R d ; b ∈ R , w ∈ R d     bias b 1 w 0     x 1 w 1 x 1 w 1     x =  ; w =  .  .   .  . . . . .  ; .  . . . . . x = w =     x d w d x d w d signal = w t x signal = w t x + b M Maximizing the Margin : 7 /19 � A c L Creator: Malik Magdon-Ismail Separating the data − →

  8. Separating The Data Hyperplane h = ( b, w ) h separates the data means: w t x n + b > 0 y n ( w t x n + b ) > 0 By rescaling the weights and bias, n =1 ,...,N y n ( w t x n + b ) = 1 min w t x n + b < 0 (renormalize the weights so that the signal w t x + b is meaningful) M Maximizing the Margin : 8 /19 � A c L Creator: Malik Magdon-Ismail Distance to the hyperplane − →

  9. Distance to the Hyperplane w is normal to the hyperplane: w t ( x 2 − x 1 ) = w t x 2 − w t x 1 = − b + b = 0 . x (because w t x = − b on the hyperplane) w Unit normal u = w / | | w | | . dist ( x , h ) x 2 x 1 dist ( x , h ) = | u t ( x − x 1 ) | 1 | · | w t x − w t x 1 | = | | w | 1 | · | w t x + b | = | | w | M Maximizing the Margin : 9 /19 � A c L Creator: Malik Magdon-Ismail Fatness of a separating hyperplane − →

  10. Fatness of a Separating Hyperplane 1 dist ( x , h ) = | · | w t x + b | | | w | Since | w t x n + b | = | y n ( w t x n + b ) | = y n ( w t x n + b ) , Fatness = Distance to the closest point 1 dist ( x n , h ) = | · y n ( w t x n + b ) . | | w | Fatness = min n dist ( x n , h ) 1 = | · min n y n ( w t x n + b ) ← − separation condition | | w | 1 = ← − the margin γ ( h ) | | w | | M Maximizing the Margin : 10 /19 � A c L Creator: Malik Magdon-Ismail Maximizing the margin − →

  11. Maximizing the Margin 1 margin γ ( h ) = ← − bias b does not appear here | | w | | 1 minimize 2 w t w b, w subject to: n =1 ,...,N y n ( w t x n + b ) = 1 . min 1 minimize 2 w t w b, w subject to: y n ( w t x n + b ) ≥ 1 for n = 1 , . . . , N. M Maximizing the Margin : 11 /19 � A c L Creator: Malik Magdon-Ismail Equivalent form − →

  12. Maximizing the Margin 1 margin γ ( h ) = ← − bias b does not appear here | | w | | 1 minimize 2 w t w b, w subject to: n =1 ,...,N y n ( w t x n + b ) = 1 . min 1 minimize 2 w t w b, w subject to: y n ( w t x n + b ) ≥ 1 for n = 1 , . . . , N. M Maximizing the Margin : 12 /19 � A c L Creator: Malik Magdon-Ismail Example – our toy data set − →

  13. Example – Our Toy Data Set y n ( w t x n + b ) ≥ 1     0 0 − 1 − b ≥ 1 ( i ) 2 2 − 1 − (2 w 1 + 2 w 2 + b ) ≥ 1 ( ii )     X = y =     2 0 +1 2 w 1 + b ≥ 1 ( iii )     3 0 +1 3 w 1 + b ≥ 1 ( iv ) (i) and (iii) gives w 1 ≥ 1 (ii) and (iii) gives w 2 ≤ − 1 So, 1 2 ( w 2 1 + w 2 2 ) ≥ 1 ( b = − 1 , w 1 = 1 , w 2 = − 1) Optimal Hyperplane 0 . 707 g ( x ) = sign( x 1 − x 2 − 1) 1 | = 1 √ margin: 2 ≈ 0 . 707 . 0 | w ∗ | | = 1 − x 2 − For data points (i), (ii) and (iii) y n ( w ∗ t x n + b ∗ ) = 1 x 1 ↑ Support Vectors M Maximizing the Margin : 13 /19 � A c L Creator: Malik Magdon-Ismail Quadratic programming − →

  14. Quadratic Programming 1 minimize 2 u t Q u + p t u u ∈ R q subject to: A u ≥ c u ∗ ← QP (Q , p , A , c ) (Q = 0 is linear programming) M Maximizing the Margin : 14 /19 � A c L Creator: Malik Magdon-Ismail Maximum margin hyperplane is QP − →

  15. Maximum Margin Hyperplane is QP 1 minimize 2 w t w 1 minimize 2 u t Q u + c t u b, w u ∈ R q subject to: A u ≥ a subject to: y n ( w t x n + b ) ≥ 1 for n = 1 , . . . , N. � � b ∈ R d +1 u = w w t � � � � � � � � � 1 0 b 0 0 0 t 0 t 0 t � b 2 w t w = d = u t d u = ⇒ Q = d , p = 0 d +1 I d I d I d 0 d w t 0 d 0 d         y 1 y 1 x t 1 y 1 y 1 x t 1 1 1 . . . . . . � y n �  =  , c =  u ≥ . . . . . . y n ( w t x n + b ) ≥ 1 ≡ y n x t u ≥ 1 = ⇒ ⇒ A =      . . . . . . n y N y N x t 1 y N y N x t 1 N N M Maximizing the Margin : 15 /19 � A c L Creator: Malik Magdon-Ismail Back to our example − →

  16. Back To Our Example Exercise: y n ( w t x n + b ) ≥ 1     0 0 − 1 − b ≥ 1 ( i ) 2 2 − 1 − (2 w 1 + 2 w 2 + b ) ≥ 1 ( ii )     X = y =     2 0 +1 2 w 1 + b ≥ 1 ( iii )     3 0 +1 3 w 1 + b ≥ 1 ( iv ) Show that     − 1 0 0 1     0 0 0 0 − 1 − 2 − 2 1     Q = 0 1 0 p = 0 A = c =         1 2 0 1     0 0 1 0 1 3 0 1 Use your QP-solver to give ( b ∗ , w ∗ 1 , w ∗ 2 ) = ( − 1 , 1 , − 1) M Maximizing the Margin : 16 /19 � A c L Creator: Malik Magdon-Ismail Primal QP algorithm − →

  17. Primal QP algorithm for linear-SVM 1: Let p = 0 d +1 be the ( d + 1)-vector of zeros and c = 1 N the N -vector of ones. Construct matrices Q and A, where   � 0 y 1 — y 1 x t 1 — � 0 t . . . . d A = . . , Q =   0 d I d y N — y N x t N — � �� � signed data matrix � b ∗ � = u ∗ ← QP (Q , p , A , c ) . 2: Return w ∗ 3: The final hypothesis is g ( x ) = sign( w ∗ t x + b ∗ ). M Maximizing the Margin : 17 /19 � A c L Creator: Malik Magdon-Ismail Example: SVM vs PLA − →

  18. Example: SVM vs PLA E out (SVM) 0 0 . 02 0 . 04 0 . 06 0 . 08 E out PLA depends on the ordering of data (e.g. random) M Maximizing the Margin : 18 /19 � A c L Creator: Malik Magdon-Ismail Link to regularization

  19. Link to Regularization minimize E in ( w ) w subject to: w t w ≤ C. optimal hyperplane regularization minimize E in w t w subject to E in = 0 w t w ≤ C M Maximizing the Margin : 19 /19 � A c L Creator: Malik Magdon-Ismail

Recommend


More recommend