Machine Learning Support Vector Machines Rui Xia T ext M ining Group N anjing U niversity of S cience & T echnology rxia@njust.edu.cn
Outline • Maximum Margin Linear Classifier • Duality Optimization • Soft-margin SVM • Kernel Functions • *Sequential Minimal Optimization • The Usage of SVM Toolkits Machine Learning Course, NJUST 2
Maximum Margin Linear Classifier Machine Learning Course, NJUST 3
Recall Previous Linear Classifier • Perceptron Criterion • Cross Entropy Criterion (Logistic Regression) • Least Mean Square (LMS) Criterion • … Which linear hyper-plane Which learning criterion to is better? choose? Machine Learning Course, NJUST 4
Maximum Margin Criterion Machine Learning Course, NJUST 5
Distance from Point to Hyper-plane • Linear Model • Hyper-plane • Distance (positive side) Machine Learning Course, NJUST 6
Geometric Distance & Functional Distance • Distance (negative side) • Geometric distance (uniform expression) • Functional distance Machine Learning Course, NJUST 7
Parameter Scaling • Scaling the parameter by a scale factor • Geometric margin: independent of the scale factor • Functional margin: proportional to the scale factor Machine Learning Course, NJUST 8
Maximum Margin Criterion • Formulation 1 Machine Learning Course, NJUST 9
Maximum Margin Criterion • Formulation 2 Machine Learning Course, NJUST 10
Maximum Margin Criterion • Scaling Constraint • In this constraint Machine Learning Course, NJUST 11
Maximum Margin Criterion • Formulation 3 Machine Learning Course, NJUST 12
Duality Optimization Machine Learning Course, NJUST 13
Lagrange Multiplier • In case of equality constraint Machine Learning Course, NJUST 14
An Example Machine Learning Course, NJUST 15
Lagrange Multiplier • In case of inequality constraint active inactive Machine Learning Course, NJUST 16
An Illustration Machine Learning Course, NJUST 17
Lagrange Multiplier • In case of multiple equality and inequality constraint Stationary Primal feasibility Dual feasibility Complementary condition Karush–Kuhn–Tucker ( KKT) Conditions Machine Learning Course, NJUST 18
Generalized Lagrangian and Duality • Primal Optimization Problem • Generalized Lagrangian Machine Learning Course, NJUST 19
Min-max of Lagrangian = = Machine Learning Course, NJUST 20
Primal Problem & Dual Problem • The primal problem (min-max of Lagrangian) • The dual problem (max-min of Lagrangian) • Max-min vs. Min-max When does the equality hold? Machine Learning Course, NJUST 21
Equivalency of Two Problems • The equality holds when – f and the g i ’s are convex, and the h i ’s are affine; – g i are (strictly) feasible: this means that there exists some w so that g i ( w )<0. • Equivalency of the primal and dual problems = = Dual Problem Primal Problem Machine Learning Course, NJUST 22
Karush-Kuhn-Kucker (KKT) Conditions • Furthermore, the solution of the primal and dual problems satisfy the KKT conditions: Stationary Primal feasibility Complementary condition Primal feasibility Dual feasibility Sufficient and Necessary Condition Machine Learning Course, NJUST 23
Lagrangian for SVM • The optimization problem of SVM • The Lagrangian Machine Learning Course, NJUST 24
Minimization of the Lagrangian • Take the derivative of the Lagrangian • Plug back into the Lagrangian Machine Learning Course, NJUST 25
Dual Problem of SVM • Dual Problem Guarantee that the KKT conditions are satisfied. Machine Learning Course, NJUST 26
Why “Support Vector”? • Decision function • KKT conditions Machine Learning Course, NJUST 27
The Value of the Bias is a positive support vector is a negative support vector Machine Learning Course, NJUST 28
One Remaining Problem • Decision function How to compute alpha? • Dual Problem of SVM How to solve the dual optimization problem? Machine Learning Course, NJUST 29
Soft-margin SVM Machine Learning Course, NJUST 30
Linearly Non-separable Case Linearly separable Linearly non-separable Machine Learning Course, NJUST 31
Soft Margin Criterion Maximum margin Soft margin Machine Learning Course, NJUST 32
Three Types of Slacks Machine Learning Course, NJUST 33
Lagrangian for Soft-margin SVM • Recall the equivalency of the primal and dual problems = = Dual Problem Primal Problem • Lagrangian form Machine Learning Course, NJUST 34
Dual Problem for Soft-margin SVM • Gradient • Plug back into the Lagrangian Machine Learning Course, NJUST 35
Maximum-margin SVM vs. Soft-margin SVM • Maximum-margin SVM • Soft-margin SVM Machine Learning Course, NJUST 36
KKT Complementarity Condition • Two KKT complementarity conditions • Some useful conclusions Machine Learning Course, NJUST 37
Slacks and Support Vectors Machine Learning Course, NJUST 38
Kernel Functions Machine Learning Course, NJUST 39
Low-dimensional-nonseparable to Higher-dimensional-separable Machine Learning Course, NJUST 40
From low dimension to higher dimension • Feature Space Mapping: from Low-dimensional-nonseparable to Higher-dimensional-separable Machine Learning Course, NJUST 41
Kernel Functions • Definition: Product of higher feature space • An example Machine Learning Course, NJUST 42
SVM in Higher-dimensional Feature Space • Decision function • Training process Machine Learning Course, NJUST 43
Kernel Trick in SVM • Kernel Trick in SVM – Sometimes it’s hard to know the exact projection function, but relatively easy to know the Kernel function – In SVM, all of the calculations of feature vectors are in the form of product – Therefore, we only need to know the Kernel function used in SVM, but without the need to know the exact projection function. Machine Learning Course, NJUST 44
Mercer Condition • Kernel matrix – For any finite set of points – Element of kernel matrix • A valid kernel satisfies – Symmetric – Positive semi-definite • Mercer theorem Machine Learning Course, NJUST 45
Common Kernel Functions • Linear kernel • Polynomial kernel • Gaussian kernel • Sigmoid kernel, pyramid kernel, string kernel, tree kernel… Machine Learning Course, NJUST 46
Kernel SVM • Training • Decision Machine Learning Course, NJUST 47
Soft-margin Kernel SVM • Training • Decision Machine Learning Course, NJUST 48
Sequential Minimal Optimization Machine Learning Course, NJUST 49
Coordinate Ascent • Consider a unconstrained optimization problem • Coordinate Ascent Algorithm Machine Learning Course, NJUST 50
Coordinate Ascent • An Example Machine Learning Course, NJUST 51
Recall the Dual Problem in SVM • The Dual Optimization Problem • KKT Conditions Machine Learning Course, NJUST 52
Coordinate Ascent in SVM • Choose two coordinates for optimization each time • Which two coordinates to be chosen? Machine Learning Course, NJUST 53
The SMO Algorithm where Machine Learning Course, NJUST 54
The SMO Algorithm • Variable Elimination using Equality Constraint • Optimization by letting the gradient equals zero Machine Learning Course, NJUST 55
The SMO Updating • Make use of • We finally have where Machine Learning Course, NJUST 56
Adding Inequality Constraints • Equality Constraints • Inequality Constraints Machine Learning Course, NJUST 57
Final Updating of Two Multipliers • In case of • In case of • Final Updating Machine Learning Course, NJUST 58
Heuristics to Choose Two Multipliers • First choose a Lagrange multiplier that violate the KKT condition (Osuna Theory) • Second choose a Lagrange multiplier that maximize |E1-E2| Machine Learning Course, NJUST 59
Updating of the Bias • Choose b that makes the KKT conditions hold (when alpha is not at the bounds) • Updating of b Machine Learning Course, NJUST 60
Convergence Condition • Updating of the weights in case of a linear kernel • The problem has been solved, when all the Lagrange multipliers satisfy the KKT conditions (within a user-defined tolerance). Machine Learning Course, NJUST 61
Questions? Machine Learning Course, NJUST
Recommend
More recommend