support vector machines
play

Support Vector Machines Greg Mori - CMPT 419/726 Bishop PRML Ch. 7 - PowerPoint PPT Presentation

Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Support Vector Machines Greg Mori - CMPT 419/726 Bishop PRML Ch. 7 Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Outline Maximum Margin


  1. Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Support Vector Machines Greg Mori - CMPT 419/726 Bishop PRML Ch. 7

  2. Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data

  3. Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data

  4. Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Linear Classification • Consider a two class classification problem • Use a linear model y ( x ) = w T φ ( x ) + b followed by a threshold function • For now, let’s assume training data are linearly separable • Recall that the perceptron would converge to a perfect classifier for such data • But there are many such perfect classifiers

  5. Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Max Margin y = 1 y = 0 y = − 1 margin • We can define the margin of a classifier as the minimum distance to any example • In support vector machines the decision boundary which maximizes the margin is chosen

  6. Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Marginal Geometry y > 0 x 2 y = 0 R 1 y < 0 R 2 x w y ( x ) � w � x ⊥ x 1 − w 0 � w � • Recall from Ch. 4 • Projection of x in w dir. is w T x || w || • y ( x ) = 0 when w T x = − b , or w T x || w || = − b || w || • So w T x || w || = y ( x ) || w || − − b || w || is signed distance to decision boundary

  7. Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Support Vectors y = − 1 y = 0 y = 1 • Assuming data are separated by the hyperplane, distance to decision boundary is t n y ( x n ) || w || • The maximum margin criterion chooses w , b by: � 1 � n [ t n ( w T φ ( x n ) + b )] arg max || w || min w , b • Points with this min value are known as support vectors

  8. Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Canonical Representation • This optimization problem is complex: � 1 � n [ t n ( w T φ ( x n ) + b )] arg max || w || min w , b • Note that rescaling w → κ w and b → κ b does not change distance t n y ( x n ) || w || (many equiv. answers) • So for x ∗ closest to surface, can set: t ∗ ( w T φ ( x ∗ ) + b ) = 1 • All other points are at least this far away: ∀ n , t n ( w T φ ( x n ) + b ) ≥ 1 • Under these constraints, the optimization becomes: 1 1 2 || w || 2 arg max || w || = arg min w , b w , b

  9. Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Canonical Representation • This optimization problem is complex: � 1 � n [ t n ( w T φ ( x n ) + b )] arg max || w || min w , b • Note that rescaling w → κ w and b → κ b does not change distance t n y ( x n ) || w || (many equiv. answers) • So for x ∗ closest to surface, can set: t ∗ ( w T φ ( x ∗ ) + b ) = 1 • All other points are at least this far away: ∀ n , t n ( w T φ ( x n ) + b ) ≥ 1 • Under these constraints, the optimization becomes: 1 1 2 || w || 2 arg max || w || = arg min w , b w , b

  10. Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Canonical Representation • This optimization problem is complex: � 1 � n [ t n ( w T φ ( x n ) + b )] arg max || w || min w , b • Note that rescaling w → κ w and b → κ b does not change distance t n y ( x n ) || w || (many equiv. answers) • So for x ∗ closest to surface, can set: t ∗ ( w T φ ( x ∗ ) + b ) = 1 • All other points are at least this far away: ∀ n , t n ( w T φ ( x n ) + b ) ≥ 1 • Under these constraints, the optimization becomes: 1 1 2 || w || 2 arg max || w || = arg min w , b w , b

  11. Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Canonical Representation • This optimization problem is complex: � 1 � n [ t n ( w T φ ( x n ) + b )] arg max || w || min w , b • Note that rescaling w → κ w and b → κ b does not change distance t n y ( x n ) || w || (many equiv. answers) • So for x ∗ closest to surface, can set: t ∗ ( w T φ ( x ∗ ) + b ) = 1 • All other points are at least this far away: ∀ n , t n ( w T φ ( x n ) + b ) ≥ 1 • Under these constraints, the optimization becomes: 1 1 2 || w || 2 arg max || w || = arg min w , b w , b

  12. Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Canonical Representation • So the optimization problem is now a constrained optimization problem: 1 2 || w || 2 arg min w , b ∀ n , t n ( w T φ ( x n ) + b ) ≥ 1 s . t . • To solve this, we need to take a detour into Lagrange multipliers

  13. Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data

  14. Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Lagrange Multipliers Consider the problem: ∇ f ( x ) x A max f ( x ) x ∇ g ( x ) s . t . g ( x ) = 0 g ( x ) = 0 • Points on g ( x ) = 0 must have ∇ g ( x ) normal to surface • A stationary point must have no change in f in the direction of the surface, so ∇ f ( x ) must also be in this same direction • So there must be some λ such that ∇ f ( x ) + λ ∇ g ( x ) = 0 • Define Lagrangian: L ( x , λ ) = f ( x ) + λ g ( x ) • Stationary points of L ( x , λ ) have ∇ x L ( x , λ ) = ∇ f ( x ) + λ ∇ g ( x ) = 0 and ∇ λ L ( x , λ ) = g ( x ) = 0 • So are stationary points of constrained problem!

  15. Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Lagrange Multipliers Consider the problem: ∇ f ( x ) x A max f ( x ) x ∇ g ( x ) s . t . g ( x ) = 0 g ( x ) = 0 • Points on g ( x ) = 0 must have ∇ g ( x ) normal to surface • A stationary point must have no change in f in the direction of the surface, so ∇ f ( x ) must also be in this same direction • So there must be some λ such that ∇ f ( x ) + λ ∇ g ( x ) = 0 • Define Lagrangian: L ( x , λ ) = f ( x ) + λ g ( x ) • Stationary points of L ( x , λ ) have ∇ x L ( x , λ ) = ∇ f ( x ) + λ ∇ g ( x ) = 0 and ∇ λ L ( x , λ ) = g ( x ) = 0 • So are stationary points of constrained problem!

  16. Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Lagrange Multipliers Consider the problem: ∇ f ( x ) x A max f ( x ) x ∇ g ( x ) s . t . g ( x ) = 0 g ( x ) = 0 • Points on g ( x ) = 0 must have ∇ g ( x ) normal to surface • A stationary point must have no change in f in the direction of the surface, so ∇ f ( x ) must also be in this same direction • So there must be some λ such that ∇ f ( x ) + λ ∇ g ( x ) = 0 • Define Lagrangian: L ( x , λ ) = f ( x ) + λ g ( x ) • Stationary points of L ( x , λ ) have ∇ x L ( x , λ ) = ∇ f ( x ) + λ ∇ g ( x ) = 0 and ∇ λ L ( x , λ ) = g ( x ) = 0 • So are stationary points of constrained problem!

  17. Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Lagrange Multipliers Consider the problem: ∇ f ( x ) x A max f ( x ) x ∇ g ( x ) s . t . g ( x ) = 0 g ( x ) = 0 • Points on g ( x ) = 0 must have ∇ g ( x ) normal to surface • A stationary point must have no change in f in the direction of the surface, so ∇ f ( x ) must also be in this same direction • So there must be some λ such that ∇ f ( x ) + λ ∇ g ( x ) = 0 • Define Lagrangian: L ( x , λ ) = f ( x ) + λ g ( x ) • Stationary points of L ( x , λ ) have ∇ x L ( x , λ ) = ∇ f ( x ) + λ ∇ g ( x ) = 0 and ∇ λ L ( x , λ ) = g ( x ) = 0 • So are stationary points of constrained problem!

  18. Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Lagrange Multipliers Example x 2 • Consider the problem ( x ⋆ 1 , x ⋆ 2 ) f ( x 1 , x 2 ) = 1 − x 2 1 − x 2 max 2 x x 1 s . t . g ( x 1 , x 2 ) = x 1 + x 2 − 1 = 0 g ( x 1 , x 2 ) = 0 • Lagrangian: L ( x , λ ) = 1 − x 2 1 − x 2 2 + λ ( x 1 + x 2 − 1 ) • Stationary points require: ∂ L /∂ x 1 = − 2 x 1 + λ = 0 ∂ L /∂ x 2 = − 2 x 2 + λ = 0 ∂ L /∂λ = x 1 + x 2 − 1 = 0 • So stationary point is ( x ∗ 1 , x ∗ 2 ) = ( 1 2 , 1 2 ) , λ = 1

  19. Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Lagrange Multipliers - Inequality Constraints Consider the problem: ∇ f ( x ) x A max f ( x ) ∇ g ( x ) x s . t . g ( x ) ≥ 0 x B g ( x ) = 0 g ( x ) > 0 • Optimization over a region – solutions either at stationary points (gradients 0) in region or on boundary L ( x , λ ) = f ( x ) + λ g ( x ) • Solutions have either: • ∇ f ( x ) = 0 and λ = 0 (in region), or • ∇ f ( x ) = − λ ∇ g ( x ) and λ > 0 (on boundary, > for maximizing f ). • For both, λ g ( x ) = 0 • Solutions have g ( x ) ≥ 0 , λ ≥ 0 , λ g ( x ) = 0

  20. Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Lagrange Multipliers - Inequality Constraints Consider the problem: ∇ f ( x ) x A max f ( x ) ∇ g ( x ) x s . t . g ( x ) ≥ 0 x B g ( x ) = 0 g ( x ) > 0 • Optimization over a region – solutions either at stationary points (gradients 0) in region or on boundary L ( x , λ ) = f ( x ) + λ g ( x ) • Solutions have either: • ∇ f ( x ) = 0 and λ = 0 (in region), or • ∇ f ( x ) = − λ ∇ g ( x ) and λ > 0 (on boundary, > for maximizing f ). • For both, λ g ( x ) = 0 • Solutions have g ( x ) ≥ 0 , λ ≥ 0 , λ g ( x ) = 0

Recommend


More recommend