SUPPORT VECTOR MACHINES SUPPORT VECTOR MACHINES Matthieu R Bloch Tuesday, February 25, 2020 1
LOGISTICS LOGISTICS TAs and Office hours Tuesday: Dr. Bloch (College of Architecture Cafe) - 11am - 11:55am Tuesday: TJ (VL C449 Cubicle D) - 1:30pm - 2:45pm Thursday: Hossein (VL C449 Cubicle B): 10:45pm - 12:00pm Friday: Brighton (TSRB 523a) - 12pm-1:15pm Projects Thanks for forming teams Start working on your proposals! Discussion: proposal deadline extension Midterm March 5th Sample midterm posted ( do not share ) Open notes 2
RECAP: KARUSH-KUHN TUCKER CONDITIONS RECAP: KARUSH-KUHN TUCKER CONDITIONS Assume , , are all differentiable f { } g i { h j } Consider x , λ , μ Stationarity: p m 0 = ∇ f ( x ) + ∑ λ i ∇ ( x ) + g i ∑ μ j ∇ h j ( x ) i =1 j =1 Primal feasibility: ∀ i ∈ [1; m ] g i ( x ) ≤ 0 ∀ j ∈ [1; p ] h j ( x ) = 0 Dual feasibility: ∀ i ∈ [1; m ] λ i ≥ 0 Complementary slackness: ∀ i ∈ [1; m ] λ i g i ( x ) = 0 3
KKT CONDITIONS: NECESSITY AND SUFFICIENCY KKT CONDITIONS: NECESSITY AND SUFFICIENCY Theorem (KKT necessity) If and are primal and dual solutions with zero duality gap, then and satisfy x ∗ λ ∗ μ ∗ x ∗ λ ∗ μ ∗ ( , ) ( , ) the KKT conditions. Theorem (KKT sufficiency) If the original problem is convex and and satisfy the KKT conditions, then is primal optimal, ~ μ ~ ~ ~ x ( , λ ) x is dual optimal, and the duality gap is zero. ~ μ ~ ( , λ ) If a constrained optimization problem is differentiable and convex KKT conditions are necessary and sufficient for primal/dual optimality (with zero duality gap) we can use the KKT conditions to find a solution to our optimization problem We’re in luck: the optimal so�-margin hyperplane falls in this category! 4
OPTIMAL SOFT-MARGIN HYPERPLANE REVISITED OPTIMAL SOFT-MARGIN HYPERPLANE REVISITED The optimal so�-margin hyperplane is the solution of the following N 1 C 2 ∥ w ∥ 2 y i w ⊺ x i argmin + N ∑ ξ i s.t. ∀ i ∈ [1; N ] ( + b ) ≥ 1 − ξ i and ξ i ≥ 0 2 w , b , ξ i =1 Optimization problem is differentiable and convex KKT conditions are necessary and sufficient, duality gap is zero We will kernelize the dual problem The Lagrangian is N N N 1 C 2 w ⊺ y i w ⊺ x i L ( w , b , ξ , λ , μ ) ≜ w + N ∑ ξ i + ∑ λ i (1 − ξ i − ( + b )) − ∑ μ i ξ i i =1 i =1 i =1 with . λ ≥ 0 , μ ≥ 0 The Lagrange dual function is L D ( λ , μ ) = min L ( w , b , ξ , λ , μ ) w , b , ξ The dual problem is λ ≥ 0 , μ ≥ 0 L D max ( λ , μ ) 5
6
7
OPTIMAL SOFT-MARGIN HYPERPLANE: KERNELIZATION OPTIMAL SOFT-MARGIN HYPERPLANE: KERNELIZATION Let’s simplify using the KKT conditions L D ( λ , μ ) Lemma (Simplification of dual function) The dual function is N N N 1 λ i λ j y i y j x ⊺ L D ( λ , μ ) = − 2 ∑ ∑ i x j + ∑ λ i i =1 j =1 i =1 Lemma (Simplification of dual problem) The dual optimization problem function is N N N ∑ N 1 ∀ i ∈ [1; N ] i =1 λ i y i = 0 λ i λ j y i y j x ⊺ max − 2 ∑ ∑ i x j + ∑ λ i s .t. { C λ , μ ∀ i ∈ [1; N ] 0 ≤ λ i ≤ i =1 j =1 i =1 N We can very efficiently solve for λ ∗ 8
9
10
11
OPTIMAL SOFT-MARGIN HYPERPLANE: PRIMAL SOLUTIONS OPTIMAL SOFT-MARGIN HYPERPLANE: PRIMAL SOLUTIONS Assume that we now know , how do we find ? λ ∗ μ ∗ w ∗ b ∗ ( , ) ( , ) Lemma (Finding primal solutions) N w ∗ ⊺ x i w ∗ b ∗ λ ∗ = ∑ i y i x i and = y i − i =1 for some such that C λ ∗ i ∈ [1; N ] 0 < < i N The only data points that matter are those for which λ ∗ ≠ 0 i By completementary slackness they are the ones for which y i w ∗ ⊺ x i ξ ∗ ( + b ) = 1 − i These points are called support vectors Points are on or inside the margin In practice, the number of support vectors is o�en ≪ N 12
13
Recommend
More recommend