10701 Recitation 5 Duality and SVM Ahmed Hefny
Outline • Langrangian and Duality – The Lagrangian – Duality – Examples • Support Vector Machines – Primal Formulation – Dual Formulation – Soft Margin and Hinge Loss
Lagrangian • Consider the problem min 𝑦 𝑔(𝑦) s.t. 𝑗 𝑦 = 0 • Add a Lagrange multiplier for each constraint 𝑀 𝑦, 𝑣 = 𝑔 𝑦 + 𝑗 𝑣 𝑗 𝑗 (𝑦)
Lagrangian • Lagrangian 𝑀 𝑦, 𝑣 = 𝑔 𝑦 + 𝑗 𝑣 𝑗 𝑗 (𝑦) • Setting gradient to 0 gives – 𝑗 𝑦 = 0 [Feasible point] – 𝛼𝑔 𝑦 + 𝑗 𝑣 𝑗 𝛼 𝑗 𝑦 = 0 [Cannot decrease 𝑔 except by violating constraints]
Lagrangian • Consider the problem min 𝑦 𝑔(𝑦) 𝑗 𝑦 = 0 s.t. ℎ 𝑘 𝑦 ≤ 0 • Add a Lagrange multiplier for each constraint 𝑀 𝑦, 𝑣, 𝜇 = 𝑔 𝑦 + 𝑗 𝑣 𝑗 𝑗 (𝑦) + 𝑘 𝜇 𝑘 ℎ 𝑘 (𝑦)
Duality
Duality • Primal problem min 𝑦 𝑔(𝑦) 𝑗 𝑦 = 0 s.t. ℎ 𝑘 𝑦 ≤ 0 • Equivalent to min 𝜇≥0,𝑣 𝑔 𝑦 + max 𝑣 𝑗 𝑗 (𝑦) + 𝜇 𝑘 ℎ 𝑘 (𝑦) 𝑦 𝑗 𝑘
Duality • Primal problem min 𝑦 𝑔(𝑦) 𝑗 𝑦 = 0 s.t. ℎ 𝑘 𝑦 ≤ 0 • Equivalent to 𝑦 𝑔(𝑦) 𝑦 𝑗𝑡 𝑔𝑓𝑏𝑡𝑗𝑐𝑚𝑓 min ∞ 𝑝. 𝑥.
Duality • Dual Problem 𝑦 𝑔 𝑦 + 𝑗 𝑣 𝑗 𝑗 (𝑦) + 𝑘 𝜇 𝑘 ℎ 𝑘 (𝑦) 𝜇≥0,𝑣 min max Lagrangian Dual Function 𝑀(𝜇, 𝑣) • Dual function: – Concave, regardless of the convexity of the primal – Lower bound on primal
Duality Primal Problem min 𝑦 max 𝜇≥0 𝑀(𝑦, 𝜇) 𝑦 λ
Duality Primal Problem min 𝑦 max 𝜇≥0 𝑀(𝑦, 𝜇) 𝑦 For each row (choice of 𝑦 ), pick the largest element then select the minimum. λ
Duality Dual Problem max 𝜇≥0 min 𝑦 𝑀(𝑦, 𝜇) 𝑦 For each column (choice of 𝜇 ), pick the smallest element then select the maximum. λ
Duality Claim: min 𝑦 max 𝜇≥0 𝑀(𝑦, 𝜇) ≥ max 𝜇≥0 min 𝑦 𝑀(𝑦, 𝜇) 𝑦 ∗ , 𝜇 ∗ 𝑦 λ
Duality Claim: min 𝑦 max 𝜇≥0 𝑀(𝑦, 𝜇) ≥ max 𝜇≥0 min 𝑦 𝑀(𝑦, 𝜇) 𝑦 ∗ , 𝜇 ∗ 𝑦 For any 𝜇 ≥ 0 𝑦 𝑀(𝑦, 𝜇) ≤ 𝑀 𝑦 ∗ , 𝜇 ≤ 𝑀(𝑦 ∗ , 𝜇 ∗ ) min The difference between primal minimum And dual maximum is called duality gap λ duality gap = 0 Strong Duality
Duality When does min 𝑦 max 𝜇≥0 𝑀(𝑦, 𝜇) = max 𝜇≥0 min 𝑦 𝑀(𝑦, 𝜇) 𝑦 ∗ , 𝜇 ∗ 𝑦 λ
Duality When does min 𝑦 max 𝜇≥0 𝑀(𝑦, 𝜇) = max 𝜇≥0 min 𝑦 𝑀(𝑦, 𝜇) 𝒚 ∗ , 𝝁 ∗ 𝑦 𝑦 ∗ , 𝜇 ∗ is a saddle point 𝑀 𝑦 ∗ , 𝜇 ≤ 𝑀 𝑦 ∗ , 𝜇 ∗ ≤ 𝑀(𝑦, 𝜇 ∗ ) λ
Duality When does min 𝑦 max 𝜇≥0 𝑀(𝑦, 𝜇) = max 𝜇≥0 min 𝑦 𝑀(𝑦, 𝜇) 𝒚 ∗ , 𝝁 ∗ 𝑦 𝑦 ∗ , 𝜇 ∗ is a saddle point 𝑀 𝑦 ∗ , 𝜇 ≤ 𝑀 𝑦 ∗ , 𝜇 ∗ ≤ 𝑀(𝑦, 𝜇 ∗ ) Necessity By definition of dual Sufficiency x 𝑀(𝑦, 𝜇) ≤ 𝑀 𝑦 ∗ , 𝜇 ∗ 𝑀 𝜇 = min λ 𝑀 𝜇 ∗ = 𝑀 𝑦 ∗ , 𝜇 ∗
Duality When does min 𝑦 max 𝜇≥0 𝑀(𝑦, 𝜇) = max 𝜇≥0 min 𝑦 𝑀(𝑦, 𝜇) 𝒚 ∗ , 𝝁 ∗ 𝑦 𝑦 ∗ , 𝜇 ∗ is a saddle point 𝑀 𝑦 ∗ , 𝜇 ≤ 𝑀 𝑦 ∗ , 𝜇 ∗ ≤ 𝑀(𝑦, 𝜇 ∗ ) Necessity By definition of dual Sufficiency 𝑀 𝜇 = min 𝑦 𝑀(𝑦, 𝜇) ≤ 𝑀 𝑦 ∗ , 𝜇 ∗ λ 𝑀 𝜇 ∗ = 𝑀 𝑦 ∗ , 𝜇 ∗ The dual at 𝜇 ∗ is the upper bound
Duality • If strong duality holds, KKT conditions apply to optimal point – Stationary Point 𝛼𝑀 𝑦, 𝑣, 𝜇 = 0 – Primal Feasibility – Dual Feasibility ( 𝜇 ≥ 0 ) – Complementary Slackness ( 𝜇 𝑗 ℎ 𝑗 𝑦 = 0 ) • KKT conditions are – Sufficient – Necessary under strong duality
Example: LP • Primal 𝑦 𝑑 𝑈 𝑦 min s.t. 𝐵𝑦 ≥ 𝑐
Example: LP • Primal 𝑦 𝑑 𝑈 𝑦 min s.t. 𝐵𝑦 ≥ 𝑐 • Lagrangian 𝑀 𝑦, 𝜇 = 𝑑 𝑈 𝑦 − 𝜇 𝑈 𝐵𝑦 − 𝑐
Example: LP • Dual Function 𝑦 𝑑 𝑈 𝑦 − 𝜇 𝑈 𝐵𝑦 − 𝑐 𝑀 𝜇 = min
Example: LP • Dual Function 𝑦 𝑑 𝑈 𝑦 − 𝜇 𝑈 𝐵𝑦 − 𝑐 𝑀 𝜇 = min • Set gradient w.r.t 𝑦 to 0 − 𝐵 𝑈 𝜇 = 0 𝑑
Example: LP • Dual Function 𝑦 𝑑 𝑈 𝑦 − 𝜇 𝑈 𝐵𝑦 − 𝑐 𝑀 𝜇 = min • Set gradient w.r.t 𝑦 to 0 𝑑 − 𝐵 𝑈 𝜇 = 0 • Dual Problem 𝜇≥0 𝜇 𝑈 𝑐 max s.t. 𝑑 − 𝐵 𝑈 𝜇 = 0 Why keep this as a constraint ?
Example: LASSO • We will use duality to transform LASSO into a QP
Example: LASSO Primal min 1 2 𝑧 − 𝑌𝑥 2 + 𝛿 𝑥 1 What is the dual function in this case ?
Example: LASSO Reformulated Primal min 1 2 𝑧 − 𝑨 2 + 𝛿 𝑥 1 s.t. 𝑨 = 𝑌𝑥 Dual 1 2 𝑧 − 𝑨 2 + 𝛿 𝑥 1 + 𝜇 𝑈 (𝑨 − 𝑌𝑥) 𝑀 𝜇 = min 𝑨,𝑥
Example: LASSO Dual 1 2 𝑧 − 𝑨 2 + 𝛿 𝑥 1 + 𝜇 𝑈 (𝑨 − 𝑌𝑥) 𝑀 𝜇 = min 𝑨,𝑥 Setting gradient to zero gives 𝑨 = 𝑧 − 𝜇 𝑌 𝑈 𝜇 ∞ ≤ 𝛿
Example: LASSO • Dual Problem max − 1 2 𝜇 2 + 𝜇 𝑈 𝑧 s.t. 𝑌 𝑈 𝜇 ∞ ≤ 𝛿
Support Vector Machines docs.opencv.org
Support Vector Machines • Find the maximum margin hyper-plane • “Distance” from a point 𝑦 to the hyper-plane 𝑥, 𝑦 𝑗 + 𝑐 = 0 is given by 𝑒 𝑗 = ( 𝑥, 𝑦 𝑗 + 𝑐)/ 𝑥 1 • 𝑁𝑏𝑠𝑗𝑜 = min 𝑗 𝑧 𝑗 𝑒 𝑗 = 𝑥 min 𝑗 𝑥, 𝑦 𝑗 + 𝑐 𝑧 𝑗 1 • Max Margin: max 𝑥 min 𝑗 𝑥, 𝑦 𝑗 + 𝑐 𝑧 𝑗 𝑥,𝑐
Support Vector Machines • Max Margin 1 max min 𝑗 𝑥, 𝑦 𝑗 + 𝑐 𝑧 𝑗 𝑥 𝑥,𝑐 • Unpleasant (max min ?) • No Unique Solution
Support Vector Machines • Max Margin 1 max min 𝑗 𝑥, 𝑦 𝑗 + 𝑐 𝑧 𝑗 𝑥 𝑥,𝑐 s.t. ???
Support Vector Machines • Max Margin 1 max min 𝑗 𝑥, 𝑦 𝑗 + 𝑐 𝑧 𝑗 𝑥 𝑥,𝑐 s.t. min 𝑗 𝑥, 𝑦 𝑗 + 𝑐 𝑧 𝑗 = 1
Support Vector Machines • Max Margin 1 2 𝑥 2 min 𝑥,𝑐 s.t. min 𝑥, 𝑦 𝑗 + 𝑐 𝑧 𝑗 = 1 𝑗
Support Vector Machines • Max Margin (Canonical Representation) 1 2 𝑥 2 min 𝑥,𝑐 𝑥, 𝑦 𝑗 + 𝑐 𝑧 𝑗 ≥ 1, ∀𝑗 s.t. • QP, much better than 1 max 𝑥 min 𝑗 𝑥, 𝑦 𝑗 + 𝑐 𝑧 𝑗 𝑥,𝑐
SVM Dual Problem Recall that the Lagrangian is formed by adding a Lagrange multiplier for each constraint. 𝑀 𝑥, 𝑐, 𝛽 = 1 2 𝑥 2 − 𝛽 𝑗 [ 𝑥, 𝑦 𝑗 + 𝑐 𝑧 𝑗 − 1] 𝑗
SVM Dual Problem 𝑀 𝑥, 𝑐, 𝛽 = 1 2 𝑥 2 − 𝛽 𝑗 𝑥, 𝑦 𝑗 + 𝑐 𝑧 𝑗 − 1 𝑗 Fix 𝛽 and minimize w.r.t 𝑥, 𝑐 : 𝑥 − 𝑗 𝛽 𝑗 𝑧 𝑗 𝑦 𝑗 = 0 𝑗 𝛽 𝑗 𝑧 𝑗 = 0
SVM Dual Problem 𝑀 𝑥, 𝑐, 𝛽 = 1 2 𝑥 2 − 𝛽 𝑗 𝑥, 𝑦 𝑗 + 𝑐 𝑧 𝑗 − 1 𝑗 Fix 𝛽 and minimize w.r.t 𝑥, 𝑐 : Plug-in 𝑥 − 𝑗 𝛽 𝑗 𝑧 𝑗 𝑦 𝑗 = 0 𝑗 𝛽 𝑗 𝑧 𝑗 = 0 Constraint (why ?)
SVM Dual Problem Dual Problem max − 1 2 𝛽 𝑗 𝛽 𝑘 𝑧 𝑗 𝑧 𝑘 𝑦 𝑗 , 𝑦 𝑘 + 𝛽 𝑗 𝑗 𝑘 𝑗 s.t. 𝑗 𝛽 𝑗 𝑧 𝑗 = 0 𝛽 𝑗 ≥ 0 Another QP. So what ?
SVM Dual Problem • Only Inner products Kernel Trick • Complementary Slackness Support Vectors • KKT conditions lead to Efficient optimization algorithms (compared to general QP solver)
SVM Dual Problem • Classification of a test point 𝑔 𝑦 = 𝑥, 𝑦 + 𝑐 = 𝛽 𝑗 𝑧 𝑗 𝑦 𝑗 , 𝑦 + 𝑐 𝑗 • To get 𝑐 use the fact that 𝑧 𝑗 𝑔(𝑦 𝑗 ) = 1 for any support vector. • For numerical stability, average over all support vectors.
Soft Margin SVM Hard Margin SVM 2 1 w,b 𝑗 𝐹 ∞ 1 − min 𝑥, 𝑦 𝑗 + 𝑐 𝑧 𝑗 + 2 𝑥 , where 𝐹 ∞ 𝑦 = ∞ 𝑦 ≥ 0 0 𝑦 < 0
Soft Margin SVM Hard Margin SVM 2 1 w,b 𝑗 𝐹 ∞ 1 − min 𝑥, 𝑦 𝑗 + 𝑐 𝑧 𝑗 + 2 𝑥 , where loss regularization 𝑚𝑝𝑡𝑡 𝐹 ∞ 𝑦 = ∞ 𝑦 ≥ 0 0 𝑦 < 0 𝑧 𝑗 𝑔(𝑦 𝑗 )
Soft Margin SVM Relax it a little bit 2 1 w,b 𝑗 𝐹 𝐷 1 − min 𝑥, 𝑦 𝑗 + 𝑐 𝑧 𝑗 + 2 𝑥 , where 𝐹 𝐷 𝑦 = 𝐷𝑦 𝑦 ≥ 0 0 𝑦 < 0
Soft Margin SVM Relax it a little bit 2 1 w,b 𝑗 𝐹 𝐷 1 − min 𝑥, 𝑦 𝑗 + 𝑐 𝑧 𝑗 + 2 𝑥 , where 𝑚𝑝𝑡𝑡 𝐹 𝐷 𝑦 = 𝐷𝑦 𝑦 ≥ 0 0 𝑦 < 0 𝑧 𝑗 𝑔(𝑦 𝑗 )
Soft Margin SVM Relax it a little bit 1 2 𝑥 2 w,b 𝐷 𝑗 1 − min 𝑥, 𝑦 𝑗 + 𝑐 𝑧 𝑗 + + 𝑚𝑝𝑡𝑡 𝑧 𝑗 𝑔(𝑦 𝑗 )
Soft Margin SVM Equivalent Formulation 1 2 𝑥 2 w,b,𝜂 𝐷 𝑗 𝜂 𝑗 + min s.t. 𝜂 𝑗 ≥ 0 𝑥, 𝑦 𝑗 + 𝑐 𝑧 𝑗 ≥ 1 − 𝜂 𝑗
Recommend
More recommend