Support Vector Machines Charlie Frogner 1 MIT 2011 1 Slides mostly - PowerPoint PPT Presentation

Support Vector Machines Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). C. Frogner Support Vector Machines

Plan Regularization derivation of SVMs. Analyzing the SVM problem: optimization, duality. Geometric derivation of SVMs. Practical issues. C. Frogner Support Vector Machines

The Regularization Setting (Again) Given n examples ( x 1 , y 1 ) , . . . , ( x n , y n ) , with x i ∈ R n and y i ∈ {− 1 , 1 } for all i . We can find a classification function by solving a regularized learning problem: n 1 V ( y i , f ( x i )) + λ || f || 2 � argmin H . n f ∈H i = 1 Note that in this class we are specifically considering binary classification . C. Frogner Support Vector Machines

The Hinge Loss The classical SVM arises by considering the specific loss function V ( f ( x , y )) ≡ ( 1 − yf ( x )) + , where ( k ) + ≡ max ( k , 0 ) . C. Frogner Support Vector Machines

The Hinge Loss 4 3.5 3 2.5 Hinge Loss 2 1.5 1 0.5 0 −3 −2 −1 0 1 2 3 y * f(x) C. Frogner Support Vector Machines

Substituting In The Hinge Loss With the hinge loss, our regularization problem becomes n 1 ( 1 − y i f ( x i )) + + λ || f || 2 � argmin H . n f ∈H i = 1 Note that we don’t have a 1 2 multiplier on the regularization term. C. Frogner Support Vector Machines

Slack Variables This problem is non -differentiable (because of the “kink” in V ). So rewrite the “max” function using slack variables ξ i . � n i = 1 ξ i + λ || f || 2 1 argmin n H f ∈H ξ i ≥ 1 − y i f ( x i ) i = 1 , . . . , n subject to : i = 1 , . . . , n ξ i ≥ 0 C. Frogner Support Vector Machines

Applying The Representer Theorem Substituting in: n f ∗ ( x ) = c i K ( x , x i ) , � i = 1 we get a constrained quadratic programming problem: � n i = 1 ξ i + λ c T K c 1 argmin n c ∈ R n ,ξ ∈ R n � n subject to : ξ i ≥ 1 − y i j = 1 c j K ( x i , x j ) i = 1 , . . . , n i = 1 , . . . , n ξ i ≥ 0 C. Frogner Support Vector Machines

Adding A Bias Term Adding an unregularized bias term b (which presents some theoretical difficulties) we get the “primal” SVM: � n i = 1 ξ i + λ c T K c 1 argmin n c ∈ R n , b ∈ R ,ξ ∈ R n ξ i ≥ 1 − y i ( � n subject to : j = 1 c j K ( x i , x j ) + b ) i = 1 , . . . , n i = 1 , . . . , n ξ i ≥ 0 C. Frogner Support Vector Machines

Standard Notation In most of the SVM literature, instead of λ , a parameter C is used to control regularization: 1 C = 2 λ n . Using this definition (after multiplying our objective function by the constant 1 2 λ , the regularization problem becomes n V ( y i , f ( x i )) + 1 C 2 || f || 2 � argmin H . f ∈H i = 1 Like λ, the parameter C also controls the tradeoff between classification accuracy and the norm of the function. The primal problem becomes . . . C. Frogner Support Vector Machines

The Reparametrized Problem C � n 2 c T K c i = 1 ξ i + 1 argmin c ∈ R n , b ∈ R ,ξ ∈ R n ξ i ≥ 1 − y i ( � n j = 1 c j K ( x i , x j ) + b ) i = 1 , . . . , n subject to : i = 1 , . . . , n ξ i ≥ 0 C. Frogner Support Vector Machines

How to Solve? C � n 2 c T K c i = 1 ξ i + 1 argmin c ∈ R n , b ∈ R ,ξ ∈ R n ξ i ≥ 1 − y i ( � n subject to : j = 1 c j K ( x i , x j ) + b ) i = 1 , . . . , n i = 1 , . . . , n ξ i ≥ 0 This is a constrained optimization problem. The general approach: Form the primal problem – we did this. Lagrangian from primal – just like Lagrange multipliers. Dual – one dual variable associated to each primal constraint in the Lagrangian. C. Frogner Support Vector Machines

Lagrangian We derive the dual from the primal using the Lagrangian: n ξ i + 1 2 c T K c L ( c , ξ, b , α, ζ ) C � = i = 1 n n α i ( y i { c j K ( x i , x j ) + b } − 1 + ξ i ) � � − i = 1 j = 1 n � − ζ i ξ i i = 1 C. Frogner Support Vector Machines

Dual I Dual problem is: c ,ξ, b L ( c , ξ, b , α, ζ ) argmax inf α,ζ ≥ 0 First, minimize L w.r.t. ( c , ξ, b ) : ∂ L ⇒ c i = α i y i ( 1 ) ∂ c = 0 = n ∂ L α i y i = 0 � ( 2 ) ∂ b = 0 = ⇒ i = 1 ∂ L ⇒ C − α i − ζ i = 0 ( 3 ) ∂ξ i = 0 = ⇒ 0 ≤ α i ≤ C = C. Frogner Support Vector Machines

Dual II Dual: c ,ξ, b L ( c , ξ, b , α, ζ ) argmax inf α,ζ ≥ 0 Optimality conditions: c i = α i y i ( 1 ) � n i = 1 α i y i = 0 ( 2 ) α i ∈ [ 0 , C ] ( 3 ) Plug in ( 2 ) and ( 3 ) : n  n  c L ( c , α ) = 1 2 c T K c +  1 − y i K ( x i , x j ) c j � � argmax inf α i  α ≥ 0 i = 1 j = 1 C. Frogner Support Vector Machines

Dual II Dual: c ,ξ, b L ( c , ξ, b , α, ζ ) argmax inf α,ζ ≥ 0 Optimality conditions: c i = α i y i ( 1 ) � n i = 1 α i y i = 0 ( 2 ) α i ∈ [ 0 , C ] ( 3 ) Plug in ( 1 ) : = � n � n L ( α ) i , j = 1 α i y i K ( x i , x j ) α j y j i = 1 α i − 1 argmax 2 α ≥ 0 = � n 2 α T ( diag Y ) K ( diag Y ) α i = 1 α i − 1 C. Frogner Support Vector Machines

The Primal and Dual Problems Again C � n 2 c T K c i = 1 ξ i + 1 argmin c ∈ R n , b ∈ R ,ξ ∈ R n ξ i ≥ 1 − y i ( � n j = 1 c j K ( x i , x j ) + b ) i = 1 , . . . , n subject to : i = 1 , . . . , n ξ i ≥ 0 � n 2 α T Q α i = 1 α i − 1 max α ∈ R n � n subject to : i = 1 y i α i = 0 0 ≤ α i ≤ C i = 1 , . . . , n C. Frogner Support Vector Machines

SVM Training Basic idea: solve the dual problem to find the optimal α ’s, and use them to find b and c . The dual problem is easier to solve the primal problem. It has simple box constraints and a single equality constraint, and the problem can be decomposed into a sequence of smaller problems (see appendix). C. Frogner Support Vector Machines

Interpreting the solution α tells us: c and b . The identities of the misclassified points. How to analyze? Use the optimality conditions . Already used: derivative of L w.r.t. ( c , ξ, b ) is zero at optimality. Haven’t used: complementary slackness, primal/dual constraints. C. Frogner Support Vector Machines

Optimality Conditions: all of them All optimal solutions must satisfy: n n c j K ( x i , x j ) − y i α j K ( x i , x j ) = 0 i = 1 , . . . , n � � j = 1 j = 1 n α i y i = 0 � i = 1 C − α i − ζ i = 0 i = 1 , . . . , n n y i ( � y j α j K ( x i , x j ) + b ) − 1 + ξ i ≥ 0 i = 1 , . . . , n j = 1 n α i [ y i ( y j α j K ( x i , x j ) + b ) − 1 + ξ i ] = 0 i = 1 , . . . , n � j = 1 i = 1 , . . . , n ζ i ξ i = 0 i = 1 , . . . , n ξ i , α i , ζ i ≥ 0 C. Frogner Support Vector Machines

Optimality Conditions II These optimality conditions are both necessary and sufficient for optimality: ( c , ξ, b , α, ζ ) satisfy all of the conditions if and only if they are optimal for both the primal and the dual. (Also known as the Karush -Kuhn-Tucker (KKT) conditons.) C. Frogner Support Vector Machines

Interpreting the solution — c ∂ L ⇒ c i = α i y i , ∀ i ∂ c = 0 = C. Frogner Support Vector Machines

Interpreting the solution — b Suppose we have the optimal α i ’s. Also suppose that there exists an i satisfying 0 < α i < C . Then α i < C = ⇒ ζ i > 0 = ⇒ ξ i = 0 n y i ( y j α j K ( x i , x j ) + b ) − 1 = 0 � = ⇒ j = 1 n b = y i − y j α j K ( x i , x j ) � = ⇒ j = 1 C. Frogner Support Vector Machines

Interpreting the solution — sparsity (Remember we defined f ( x ) = � n i = 1 y i α i K ( x , x i ) + b .) y i f ( x i ) > 1 ( 1 − y i f ( x i )) < 0 ⇒ ξ i � = ( 1 − y i f ( x i )) ⇒ ⇒ α i = 0 C. Frogner Support Vector Machines

Interpreting the solution — - support vectors y i f ( x i ) < 1 ( 1 − y i f ( x i )) > 0 ⇒ ⇒ ξ i > 0 ⇒ ζ i = 0 α i = C ⇒ C. Frogner Support Vector Machines

Interpreting the solution — support vectors So y i f ( x i ) < 1 ⇒ α i = C . Conversely, suppose α i = C : α i = C ξ i = 1 − y i f ( x i ) = ⇒ y i f ( x i ) ≤ 1 = ⇒ C. Frogner Support Vector Machines

Interpreting the solution Here are all of the derived conditions: y i f ( x i ) ≥ 1 α i = 0 = ⇒ 0 < α i < C y i f ( x i ) = 1 = ⇒ α i = C y i f ( x i ) < 1 ⇐ = y i f ( x i ) > 1 α i = 0 ⇐ = α i = C y i f ( x i ) ≤ 1 = ⇒ C. Frogner Support Vector Machines

Geometric Interpretation of Reduced Optimality Conditions C. Frogner Support Vector Machines

Summary so far The SVM is a Tikhonov regularization problem, using the hinge loss: n 1 ( 1 − y i f ( x i )) + + λ || f || 2 � argmin H . n f ∈H i = 1 Solving the SVM means solving a constrained quadratic program. Solutions can be sparse – some coefficients are zero. The nonzero coefficients correspond to points that aren’t classified correctly enough – this is where the “support vector” in SVM comes from. C. Frogner Support Vector Machines

Support Vector Machines Charlie Frogner 1 MIT 2011 1 Slides mostly - PowerPoint PPT Presentation

Support Vector Machines Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). C. Frogner Support Vector Machines Plan Regularization derivation of SVMs. Analyzing the SVM problem: optimization, duality. Geometric

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

? 17.10.2018 3 17.10.2018 4 Support Vector Machines (SVM): Background Support Vector Machines

Support Vector Machines October 16, 2018 Support Vector Machines October 16, 2018 1 / 31

Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 Jukka Lankinen Relevance Vector

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Support Vector Machines & Kernelization Barna Saha Most of the slides are made using David

Introduction Kailash Awati Instructor DataCamp Support Vector Machines in R Preliminaries

Support Vector Machines Support Vector Machines Hypothesis Space Hypothesis Space variable

Support Vector Machines (Ch. 18.9) SVM Basics Support Vector Machines (SVMs) try to do our

Support vector machines CS 446 Part 1: linear support vector machines 1.0 1.0 1.0 0.8 0.8

SUPPORT VECTOR MACHINES SUPPORT VECTOR MACHINES Matthieu R Bloch Tuesday, February 25, 2020 1

RBF Kernels: Generating a complex dataset DataCamp Support Vector Machines in R A bit about RBF

Machine Learning for NLP Support Vector Machines Aurlie Herbelot 2019 Centre for Mind/Brain

Generating a radially separable dataset DataCamp Support Vector Machines in R Generating a 2d

Support Vector Machines 290N, 2014 Support Vector Machines (SVM) Supervised learning

Collision dynamics in GRB internal shocks And their implication for the production of multiple

Q1 2007 FINANCIAL Investor Community Conference Call RESULTS KAREN MAIDMENT Chief Financial

SILK Overview IETF codec WG, Nov 8, 2010 Koen Vos Decoder Encoder Adaptive High-Pass Filter

CENG4480 Lecture 03: Operational Amplifier 2 Bei Yu byu@cse.cuhk.edu.hk (Latest update:

L ECTURE 11: S OFT SVM S Prof. Julia Hockenmaier juliahmr@illinois.edu Midterm (Thursday, March

Clarity of Record Pilot: Interview Summaries and Pre-search Interview Option 2/25/2016 1 DRAFT

Support Vector Machines Part 2 Yingyu Liang Computer Sciences 760 Fall 2017

Linear models Subhransu Maji CMPSCI 689: Machine Learning 24 February 2015 26 February 2015

Support Vector Machines Charlie Frogner 1 MIT 2011 1 Slides mostly - PowerPoint PPT Presentation

Support Vector Machines Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). C. Frogner Support Vector Machines Plan Regularization derivation of SVMs. Analyzing the SVM problem: optimization, duality. Geometric

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

? 17.10.2018 3 17.10.2018 4 Support Vector Machines (SVM): Background Support Vector Machines

Support Vector Machines October 16, 2018 Support Vector Machines October 16, 2018 1 / 31

Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 Jukka Lankinen Relevance Vector

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Support Vector Machines &amp; Kernelization Barna Saha Most of the slides are made using David

Introduction Kailash Awati Instructor DataCamp Support Vector Machines in R Preliminaries

Support Vector Machines Support Vector Machines Hypothesis Space Hypothesis Space variable

Support Vector Machines (Ch. 18.9) SVM Basics Support Vector Machines (SVMs) try to do our

Support vector machines CS 446 Part 1: linear support vector machines 1.0 1.0 1.0 0.8 0.8

SUPPORT VECTOR MACHINES SUPPORT VECTOR MACHINES Matthieu R Bloch Tuesday, February 25, 2020 1

RBF Kernels: Generating a complex dataset DataCamp Support Vector Machines in R A bit about RBF

Machine Learning for NLP Support Vector Machines Aurlie Herbelot 2019 Centre for Mind/Brain

Generating a radially separable dataset DataCamp Support Vector Machines in R Generating a 2d

Support Vector Machines 290N, 2014 Support Vector Machines (SVM) Supervised learning

Collision dynamics in GRB internal shocks And their implication for the production of multiple

Q1 2007 FINANCIAL Investor Community Conference Call RESULTS KAREN MAIDMENT Chief Financial

SILK Overview IETF codec WG, Nov 8, 2010 Koen Vos Decoder Encoder Adaptive High-Pass Filter

CENG4480 Lecture 03: Operational Amplifier 2 Bei Yu byu@cse.cuhk.edu.hk (Latest update:

L ECTURE 11: S OFT SVM S Prof. Julia Hockenmaier juliahmr@illinois.edu Midterm (Thursday, March

Clarity of Record Pilot: Interview Summaries and Pre-search Interview Option 2/25/2016 1 DRAFT

Support Vector Machines Part 2 Yingyu Liang Computer Sciences 760 Fall 2017

Linear models Subhransu Maji CMPSCI 689: Machine Learning 24 February 2015 26 February 2015

Support Vector Machines & Kernelization Barna Saha Most of the slides are made using David