Learning From Data Lecture 23 SVM’s: Maximizing the Margin A Better Hyperplane Maximizing the Margin Link to Regularization M. Magdon-Ismail CSCI 4100/6100
recap: Linear Models, RBFs, Neural Networks Linear Model with Nonlinear Transform Neural Network k -RBF-Network � � � � ˜ d m k � � � h ( x ) = θ w 0 + w j Φ j ( x ) h ( x ) = θ w 0 + w j θ ( v j t x ) h ( x ) = θ w 0 + w j φ ( | | x − µ j | | ) j =1 j =1 j =1 gradient descent k-means Neural Network: generalization of linear model by adding layers. Support Vector Machine: more ‘robust’ linear model M Maximizing the Margin : 2 /19 � A c L Creator: Malik Magdon-Ismail Which separator to pick? − →
Which Separator Do You Pick? Being robust to noise (measurement error) is good (remember regularization). M Maximizing the Margin : 3 /19 � A c L Creator: Malik Magdon-Ismail Robustness to noise − →
Robustness to Noisy Data Being robust to noise (measurement error) is good (remember regularization). M Maximizing the Margin : 4 /19 � A c L Creator: Malik Magdon-Ismail Thicker cushion means more robust − →
Thicker Cushion Means More Robustness We call such hyperplanes fat M Maximizing the Margin : 5 /19 � A c L Creator: Malik Magdon-Ismail Two crucial questions − →
Two Crucial Questions 1. Can we efficiently find the fattest separating hyperplane? 2. Is a fatter hyperplane better than a thin one? M Maximizing the Margin : 6 /19 � A c L Creator: Malik Magdon-Ismail Pulling out the bias − →
Pulling Out the Bias Before Now x ∈ { 1 } × R d ; w ∈ R d +1 x ∈ R d ; b ∈ R , w ∈ R d bias b 1 w 0 x 1 w 1 x 1 w 1 x = ; w = . . . . . . . . ; . . . . . . x = w = x d w d x d w d signal = w t x signal = w t x + b M Maximizing the Margin : 7 /19 � A c L Creator: Malik Magdon-Ismail Separating the data − →
Separating The Data Hyperplane h = ( b, w ) h separates the data means: w t x n + b > 0 y n ( w t x n + b ) > 0 By rescaling the weights and bias, n =1 ,...,N y n ( w t x n + b ) = 1 min w t x n + b < 0 (renormalize the weights so that the signal w t x + b is meaningful) M Maximizing the Margin : 8 /19 � A c L Creator: Malik Magdon-Ismail Distance to the hyperplane − →
Distance to the Hyperplane w is normal to the hyperplane: w t ( x 2 − x 1 ) = w t x 2 − w t x 1 = − b + b = 0 . x (because w t x = − b on the hyperplane) w Unit normal u = w / | | w | | . dist ( x , h ) x 2 x 1 dist ( x , h ) = | u t ( x − x 1 ) | 1 | · | w t x − w t x 1 | = | | w | 1 | · | w t x + b | = | | w | M Maximizing the Margin : 9 /19 � A c L Creator: Malik Magdon-Ismail Fatness of a separating hyperplane − →
Fatness of a Separating Hyperplane 1 dist ( x , h ) = | · | w t x + b | | | w | Since | w t x n + b | = | y n ( w t x n + b ) | = y n ( w t x n + b ) , Fatness = Distance to the closest point 1 dist ( x n , h ) = | · y n ( w t x n + b ) . | | w | Fatness = min n dist ( x n , h ) 1 = | · min n y n ( w t x n + b ) ← − separation condition | | w | 1 = ← − the margin γ ( h ) | | w | | M Maximizing the Margin : 10 /19 � A c L Creator: Malik Magdon-Ismail Maximizing the margin − →
Maximizing the Margin 1 margin γ ( h ) = ← − bias b does not appear here | | w | | 1 minimize 2 w t w b, w subject to: n =1 ,...,N y n ( w t x n + b ) = 1 . min 1 minimize 2 w t w b, w subject to: y n ( w t x n + b ) ≥ 1 for n = 1 , . . . , N. M Maximizing the Margin : 11 /19 � A c L Creator: Malik Magdon-Ismail Equivalent form − →
Maximizing the Margin 1 margin γ ( h ) = ← − bias b does not appear here | | w | | 1 minimize 2 w t w b, w subject to: n =1 ,...,N y n ( w t x n + b ) = 1 . min 1 minimize 2 w t w b, w subject to: y n ( w t x n + b ) ≥ 1 for n = 1 , . . . , N. M Maximizing the Margin : 12 /19 � A c L Creator: Malik Magdon-Ismail Example – our toy data set − →
Example – Our Toy Data Set y n ( w t x n + b ) ≥ 1 0 0 − 1 − b ≥ 1 ( i ) 2 2 − 1 − (2 w 1 + 2 w 2 + b ) ≥ 1 ( ii ) X = y = 2 0 +1 2 w 1 + b ≥ 1 ( iii ) 3 0 +1 3 w 1 + b ≥ 1 ( iv ) (i) and (iii) gives w 1 ≥ 1 (ii) and (iii) gives w 2 ≤ − 1 So, 1 2 ( w 2 1 + w 2 2 ) ≥ 1 ( b = − 1 , w 1 = 1 , w 2 = − 1) Optimal Hyperplane 0 . 707 g ( x ) = sign( x 1 − x 2 − 1) 1 | = 1 √ margin: 2 ≈ 0 . 707 . 0 | w ∗ | | = 1 − x 2 − For data points (i), (ii) and (iii) y n ( w ∗ t x n + b ∗ ) = 1 x 1 ↑ Support Vectors M Maximizing the Margin : 13 /19 � A c L Creator: Malik Magdon-Ismail Quadratic programming − →
Quadratic Programming 1 minimize 2 u t Q u + p t u u ∈ R q subject to: A u ≥ c u ∗ ← QP (Q , p , A , c ) (Q = 0 is linear programming) M Maximizing the Margin : 14 /19 � A c L Creator: Malik Magdon-Ismail Maximum margin hyperplane is QP − →
Maximum Margin Hyperplane is QP 1 minimize 2 w t w 1 minimize 2 u t Q u + c t u b, w u ∈ R q subject to: A u ≥ a subject to: y n ( w t x n + b ) ≥ 1 for n = 1 , . . . , N. � � b ∈ R d +1 u = w w t � � � � � � � � � 1 0 b 0 0 0 t 0 t 0 t � b 2 w t w = d = u t d u = ⇒ Q = d , p = 0 d +1 I d I d I d 0 d w t 0 d 0 d y 1 y 1 x t 1 y 1 y 1 x t 1 1 1 . . . . . . � y n � = , c = u ≥ . . . . . . y n ( w t x n + b ) ≥ 1 ≡ y n x t u ≥ 1 = ⇒ ⇒ A = . . . . . . n y N y N x t 1 y N y N x t 1 N N M Maximizing the Margin : 15 /19 � A c L Creator: Malik Magdon-Ismail Back to our example − →
Back To Our Example Exercise: y n ( w t x n + b ) ≥ 1 0 0 − 1 − b ≥ 1 ( i ) 2 2 − 1 − (2 w 1 + 2 w 2 + b ) ≥ 1 ( ii ) X = y = 2 0 +1 2 w 1 + b ≥ 1 ( iii ) 3 0 +1 3 w 1 + b ≥ 1 ( iv ) Show that − 1 0 0 1 0 0 0 0 − 1 − 2 − 2 1 Q = 0 1 0 p = 0 A = c = 1 2 0 1 0 0 1 0 1 3 0 1 Use your QP-solver to give ( b ∗ , w ∗ 1 , w ∗ 2 ) = ( − 1 , 1 , − 1) M Maximizing the Margin : 16 /19 � A c L Creator: Malik Magdon-Ismail Primal QP algorithm − →
Primal QP algorithm for linear-SVM 1: Let p = 0 d +1 be the ( d + 1)-vector of zeros and c = 1 N the N -vector of ones. Construct matrices Q and A, where � 0 y 1 — y 1 x t 1 — � 0 t . . . . d A = . . , Q = 0 d I d y N — y N x t N — � �� � signed data matrix � b ∗ � = u ∗ ← QP (Q , p , A , c ) . 2: Return w ∗ 3: The final hypothesis is g ( x ) = sign( w ∗ t x + b ∗ ). M Maximizing the Margin : 17 /19 � A c L Creator: Malik Magdon-Ismail Example: SVM vs PLA − →
Example: SVM vs PLA E out (SVM) 0 0 . 02 0 . 04 0 . 06 0 . 08 E out PLA depends on the ordering of data (e.g. random) M Maximizing the Margin : 18 /19 � A c L Creator: Malik Magdon-Ismail Link to regularization
Link to Regularization minimize E in ( w ) w subject to: w t w ≤ C. optimal hyperplane regularization minimize E in w t w subject to E in = 0 w t w ≤ C M Maximizing the Margin : 19 /19 � A c L Creator: Malik Magdon-Ismail
Recommend
More recommend