support vector machines
play

Support Vector Machines Marco Chiarandini Department of Mathematics - PowerPoint PPT Presentation

DM825 Introduction to Machine Learning Lecture 8 Support Vector Machines Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark Functional and Geometric Ma Optimal Margin Classifier Lagrange


  1. DM825 Introduction to Machine Learning Lecture 8 Support Vector Machines Marco Chiarandini Department of Mathematics & Computer Science University of Southern Denmark

  2. Functional and Geometric Ma Optimal Margin Classifier Lagrange Duality Karush Kuhn Tucker Conditions Overview Solving the Optimal Margin Support Vector Machines: 1. Functional and Geometric Margins 2. Optimal Margin Classifier 3. Lagrange Duality 4. Karush Kuhn Tucker Conditions 5. Solving the Optimal Margin 6. Kernels 7. Soft margins 8. SMO Algorithm 2

  3. Functional and Geometric Ma Optimal Margin Classifier Lagrange Duality Karush Kuhn Tucker Conditions In This Lecture Solving the Optimal Margin 1. Functional and Geometric Margins 2. Optimal Margin Classifier 3. Lagrange Duality 4. Karush Kuhn Tucker Conditions 5. Solving the Optimal Margin 3

  4. Functional and Geometric Ma Optimal Margin Classifier Lagrange Duality Karush Kuhn Tucker Conditions Introduction Solving the Optimal Margin ◮ Binary classification. ◮ y ∈ {− 1 , 1 } (instead of { 0 , 1 } like in GLM) ◮ Let’s have h ( � θ, � x ) output values {− 1 , 1 } : � 1 ifz ≥ 0 f ( z ) = sign( z ) − 1 ifz < 0 (hence no probabilities like in logistic regression) ◮ h ( � x ) = f ( � x ∈ R n , � θ ∈ R n , θ 0 ∈ R θ, � θ� x + θ 0 ) , � ◮ Assume for now training set is linearly separable 4

  5. Functional and Geometric Ma Optimal Margin Classifier Lagrange Duality Karush Kuhn Tucker Conditions Solving the Optimal Margin SVM determine model parameters by solving a convex optimization problem and hence a local optimal solution is also global optimal. Margin: smallest distance between the decision boundary and any of the samples. The location of the boundary is determined by a subset of the data points, known as support vectors 5

  6. Functional and Geometric Ma Optimal Margin Classifier Lagrange Duality Karush Kuhn Tucker Conditions Outline Solving the Optimal Margin 1. Functional and Geometric Margins 2. Optimal Margin Classifier 3. Lagrange Duality 4. Karush Kuhn Tucker Conditions 5. Solving the Optimal Margin 6

  7. Functional and Geometric Ma Optimal Margin Classifier Lagrange Duality Karush Kuhn Tucker Conditions Resume Solving the Optimal Margin ◮ functional margin: γ i = y i ( � x i + θ 0 ) θ T � γ i ˆ = ⇒ ˆ γ = min ˆ i requires a normalization condition ◮ geometric margin: T � � � θ θ 0 γ i = y i x i + γ i = ⇒ γ = min � � � � θ � θ � i scale invariant γ ˆ ◮ γ = � � θ � γ i = γ i the two margins correspond ◮ if � � θ � = 1 then ˆ 7

  8. Functional and Geometric Ma Optimal Margin Classifier Lagrange Duality Karush Kuhn Tucker Conditions Outline Solving the Optimal Margin 1. Functional and Geometric Margins 2. Optimal Margin Classifier 3. Lagrange Duality 4. Karush Kuhn Tucker Conditions 5. Solving the Optimal Margin 8

  9. Functional and Geometric Ma Optimal Margin Classifier Lagrange Duality Karush Kuhn Tucker Conditions Optimization Problem Solving the Optimal Margin Looking at the geometric margin: (OPT1) : max γ γ,� θ,θ 0 x i + θ 0 ) γ ≤ y i ( � θ T � ∀ i = 1 , . . . , m � � θ � = 1 γ ˆ Alternatively, looking at functional margins and recalling that γ = θ � : � � ˆ γ (OPT2) : max � � γ,� θ � θ,θ 0 ˆ x i + θ 0 ) γ ≤ y i ( � θ T � ˆ ∀ i = 1 , . . . , m 9

  10. Functional and Geometric Ma Optimal Margin Classifier Lagrange Duality Karush Kuhn Tucker Conditions Solving the Optimal Margin For the functional margins we can fix the scale, for the geometric margin no scaling problem. Then we can arbitrary fix ˆ γ = 1 1 2 � � θ � 2 (OPT3) : min � θ,θ 0 x i + θ 0 ) 1 ≤ y i ( � θ T � ∀ i = 1 , . . . , m where we used that: max 1 / � � θ � = min � � θ � � θ T � and removed the square root because monotonous in � � � θ � = θ . This problem is a convex optimization problem, it has convex quadratic objective function and linear constraints, hence it can be solved optimally and efficiently 10

  11. Functional and Geometric Ma Optimal Margin Classifier Lagrange Duality Karush Kuhn Tucker Conditions Convex optimization problem Solving the Optimal Margin minimize f 0 ( x ) subject to f i ( x ) ≤ b i , i = 1 , . . . , m objective and constraint functions are convex: f i ( αx + βy ) ≤ αf i ( x ) + βf i ( y ) if α + β = 1 , α ≥ 0 , β ≥ 0 11

  12. Functional and Geometric Ma Optimal Margin Classifier Lagrange Duality Karush Kuhn Tucker Conditions Outline Solving the Optimal Margin 1. Functional and Geometric Margins 2. Optimal Margin Classifier 3. Lagrange Duality 4. Karush Kuhn Tucker Conditions 5. Solving the Optimal Margin 12

  13. Functional and Geometric Ma Optimal Margin Classifier Lagrange Duality Karush Kuhn Tucker Conditions Lagrangian Solving the Optimal Margin standard form problem (not necessarily convex) minimize f 0 ( x ) subject to f i ( x ) ≤ 0 , i = 1 , ..., m h i ( x ) = 0 , i = 1 , ..., p variable x ∈ R n , domain D , optimal value p ∗ Lagrangian: L : R n × R m × R p → R , with dom L = D × R m × R p , p m � � L ( x, α, β ) = f 0 ( x ) + α i f i ( x ) + βh i ( x ) i =1 i =1 ◮ weighted sum of objective and constraint functions ◮ α i is Lagrange multiplier associated with f i ( x ) ≤ 0 ◮ β i is Lagrange multiplier associated with h i ( x ) = 0 α and � ◮ � β are dual or Lagrangian variables 13

  14. Functional and Geometric Ma Optimal Margin Classifier Lagrange Duality Karush Kuhn Tucker Conditions Lagrange dual function Solving the Optimal Margin Lagrange dual function: L D : R m × R p → R � m p � � � L D ( α, β ) = min x ∈D L ( x, α, β ) = min f 0 ( x ) + α i f i ( x ) + βh i ( x ) x ∈D i =1 i =1 L D is concave, can be −∞ for some α and β Lower bound property: for a feasible ˜ x L D ( α, β ) ≤ p ∗ 1. ∀ α ≥ 0 , β 2. L P ( x ) = max α ≥ 0 ,β ( L D ( α, β )) ≤ p ∗ (best lower bound, it may be = p ∗ ) Proof of (1): for any ˜ x feasible and α ≥ 0 : p m � � L (˜ x, α, β ) = f 0 (˜ x ) + α i f i (˜ x ) + βh i (˜ x ) ≤ f 0 (˜ x ) i =1 i =1 hence L D ( α, β ) = min x ∈D L ( x, α, β ) ≤ L (˜ x, α, β ) ≤ f 0 (˜ x ) (2) is true because (1) true for any α, β . 14

  15. Functional and Geometric Ma Optimal Margin Classifier Lagrange Duality Karush Kuhn Tucker Conditions Solving the Optimal Margin If f 0 and g i are convex and h i affine, d ∗ = max α ≥ 0 ,β ( L D ( α, β )) = p ∗ so we can solve the dual in place of the primal. 15

  16. Functional and Geometric Ma Optimal Margin Classifier Lagrange Duality Karush Kuhn Tucker Conditions Outline Solving the Optimal Margin 1. Functional and Geometric Margins 2. Optimal Margin Classifier 3. Lagrange Duality 4. Karush Kuhn Tucker Conditions 5. Solving the Optimal Margin 16

  17. Functional and Geometric Ma Optimal Margin Classifier Lagrange Duality Karush Kuhn Tucker Conditions Karush Kuhn Tucker Conditions Solving the Optimal Margin standard form problem (not necessarily convex) minimize f 0 ( x ) subject to g i ( x ) ≤ b i , i = 1 , ..., m variable x ∈ R n , f, g nonlinear, f : R n → R , g : R n → R m Necessary conditions for optimality (local validity): ∇ f ( x 0 ) = � m  i =1 λ i ∇ g i ( x 0 )    λ i ≥ 0 ∀ i  � m i =1 λ i ( g i ( x 0 ) − b i ) = 0    g i ( x 0 ) − b i ≤ 0  17

  18. Functional and Geometric Ma Optimal Margin Classifier Lagrange Duality Karush Kuhn Tucker Conditions Outline Solving the Optimal Margin 1. Functional and Geometric Margins 2. Optimal Margin Classifier 3. Lagrange Duality 4. Karush Kuhn Tucker Conditions 5. Solving the Optimal Margin 18

  19. Functional and Geometric Ma Optimal Margin Classifier Lagrange Duality Karush Kuhn Tucker Conditions Solving the Optimal Margin Let’s go back to our problem: 1 2 � � θ � 2 (OPT3) : min � θ,θ 0 x i + θ 0 ) 1 ≤ y i ( � θ T � ∀ i = 1 , . . . , m m α ) = 1 θ � 2 − � x i + θ 0 ) − 1 � L ( � 2 � � � y i ( � θ T � θ, θ 0 , � α i i =1 we find the dual form by solving in � θ, θ 0 L ( � L D ( � α ) = min θ, θ 0 , � α ) � θ,θ 0 m m x i = 0 θ L ( � α ) = � � ⇒ � � α i y i � α i y i � x i ∇ � θ, θ 0 , � θ − = θ = i =1 i =1 ∂ L ( � m m θ, θ 0 , � α ) α i y i α i = 0 α i y i α i = 0 � � = − = ⇒ ∂θ 0 i =1 i =1 19

  20. Functional and Geometric Ma Optimal Margin Classifier Lagrange Duality Karush Kuhn Tucker Conditions Solving the Optimal Margin Substituting in L ( � θ, θ 0 , � α ) : � m �   m θ ) = 1 L D ( � � � α i y i � x i α j y j � x j   2 i =1 j =1     m m x i + θ 0 � �  − 1  y i α j y j � x j ) � − α i  (  i =1 j =1 m m m α i − 1 � � � y i y j α i α j � � x i � x j � = 2 i =1 i =1 j =1 20

Recommend


More recommend