machine learning
play

Machine Learning Support Vector Machines Rui Xia T ext M ining - PowerPoint PPT Presentation

Machine Learning Support Vector Machines Rui Xia T ext M ining Group N anjing U niversity of S cience & T echnology rxia@njust.edu.cn Outline Maximum Margin Linear Classifier Duality Optimization Soft-margin SVM Kernel


  1. Machine Learning Support Vector Machines Rui Xia T ext M ining Group N anjing U niversity of S cience & T echnology rxia@njust.edu.cn

  2. Outline • Maximum Margin Linear Classifier • Duality Optimization • Soft-margin SVM • Kernel Functions • *Sequential Minimal Optimization • The Usage of SVM Toolkits Machine Learning Course, NJUST 2

  3. Maximum Margin Linear Classifier Machine Learning Course, NJUST 3

  4. Recall Previous Linear Classifier • Perceptron Criterion • Cross Entropy Criterion (Logistic Regression) • Least Mean Square (LMS) Criterion • … Which linear hyper-plane Which learning criterion to is better? choose? Machine Learning Course, NJUST 4

  5. Maximum Margin Criterion Machine Learning Course, NJUST 5

  6. Distance from Point to Hyper-plane • Linear Model • Hyper-plane • Distance (positive side) Machine Learning Course, NJUST 6

  7. Geometric Distance & Functional Distance • Distance (negative side) • Geometric distance (uniform expression) • Functional distance Machine Learning Course, NJUST 7

  8. Parameter Scaling • Scaling the parameter by a scale factor • Geometric margin: independent of the scale factor • Functional margin: proportional to the scale factor Machine Learning Course, NJUST 8

  9. Maximum Margin Criterion • Formulation 1 Machine Learning Course, NJUST 9

  10. Maximum Margin Criterion • Formulation 2 Machine Learning Course, NJUST 10

  11. Maximum Margin Criterion • Scaling Constraint • In this constraint Machine Learning Course, NJUST 11

  12. Maximum Margin Criterion • Formulation 3 Machine Learning Course, NJUST 12

  13. Duality Optimization Machine Learning Course, NJUST 13

  14. Lagrange Multiplier • In case of equality constraint Machine Learning Course, NJUST 14

  15. An Example Machine Learning Course, NJUST 15

  16. Lagrange Multiplier • In case of inequality constraint active inactive Machine Learning Course, NJUST 16

  17. An Illustration Machine Learning Course, NJUST 17

  18. Lagrange Multiplier • In case of multiple equality and inequality constraint Stationary Primal feasibility Dual feasibility Complementary condition Karush–Kuhn–Tucker ( KKT) Conditions Machine Learning Course, NJUST 18

  19. Generalized Lagrangian and Duality • Primal Optimization Problem • Generalized Lagrangian Machine Learning Course, NJUST 19

  20. Min-max of Lagrangian = = Machine Learning Course, NJUST 20

  21. Primal Problem & Dual Problem • The primal problem (min-max of Lagrangian) • The dual problem (max-min of Lagrangian) • Max-min vs. Min-max When does the equality hold? Machine Learning Course, NJUST 21

  22. Equivalency of Two Problems • The equality holds when – f and the g i ’s are convex, and the h i ’s are affine; – g i are (strictly) feasible: this means that there exists some w so that g i ( w )<0. • Equivalency of the primal and dual problems = = Dual Problem Primal Problem Machine Learning Course, NJUST 22

  23. Karush-Kuhn-Kucker (KKT) Conditions • Furthermore, the solution of the primal and dual problems satisfy the KKT conditions: Stationary Primal feasibility Complementary condition Primal feasibility Dual feasibility Sufficient and Necessary Condition Machine Learning Course, NJUST 23

  24. Lagrangian for SVM • The optimization problem of SVM • The Lagrangian Machine Learning Course, NJUST 24

  25. Minimization of the Lagrangian • Take the derivative of the Lagrangian • Plug back into the Lagrangian Machine Learning Course, NJUST 25

  26. Dual Problem of SVM • Dual Problem Guarantee that the KKT conditions are satisfied. Machine Learning Course, NJUST 26

  27. Why “Support Vector”? • Decision function • KKT conditions Machine Learning Course, NJUST 27

  28. The Value of the Bias is a positive support vector is a negative support vector Machine Learning Course, NJUST 28

  29. One Remaining Problem • Decision function How to compute alpha? • Dual Problem of SVM How to solve the dual optimization problem? Machine Learning Course, NJUST 29

  30. Soft-margin SVM Machine Learning Course, NJUST 30

  31. Linearly Non-separable Case Linearly separable Linearly non-separable Machine Learning Course, NJUST 31

  32. Soft Margin Criterion Maximum margin Soft margin Machine Learning Course, NJUST 32

  33. Three Types of Slacks Machine Learning Course, NJUST 33

  34. Lagrangian for Soft-margin SVM • Recall the equivalency of the primal and dual problems = = Dual Problem Primal Problem • Lagrangian form Machine Learning Course, NJUST 34

  35. Dual Problem for Soft-margin SVM • Gradient • Plug back into the Lagrangian Machine Learning Course, NJUST 35

  36. Maximum-margin SVM vs. Soft-margin SVM • Maximum-margin SVM • Soft-margin SVM Machine Learning Course, NJUST 36

  37. KKT Complementarity Condition • Two KKT complementarity conditions • Some useful conclusions Machine Learning Course, NJUST 37

  38. Slacks and Support Vectors Machine Learning Course, NJUST 38

  39. Kernel Functions Machine Learning Course, NJUST 39

  40. Low-dimensional-nonseparable to Higher-dimensional-separable Machine Learning Course, NJUST 40

  41. From low dimension to higher dimension • Feature Space Mapping: from Low-dimensional-nonseparable to Higher-dimensional-separable Machine Learning Course, NJUST 41

  42. Kernel Functions • Definition: Product of higher feature space • An example Machine Learning Course, NJUST 42

  43. SVM in Higher-dimensional Feature Space • Decision function • Training process Machine Learning Course, NJUST 43

  44. Kernel Trick in SVM • Kernel Trick in SVM – Sometimes it’s hard to know the exact projection function, but relatively easy to know the Kernel function – In SVM, all of the calculations of feature vectors are in the form of product – Therefore, we only need to know the Kernel function used in SVM, but without the need to know the exact projection function. Machine Learning Course, NJUST 44

  45. Mercer Condition • Kernel matrix – For any finite set of points – Element of kernel matrix • A valid kernel satisfies – Symmetric – Positive semi-definite • Mercer theorem Machine Learning Course, NJUST 45

  46. Common Kernel Functions • Linear kernel • Polynomial kernel • Gaussian kernel • Sigmoid kernel, pyramid kernel, string kernel, tree kernel… Machine Learning Course, NJUST 46

  47. Kernel SVM • Training • Decision Machine Learning Course, NJUST 47

  48. Soft-margin Kernel SVM • Training • Decision Machine Learning Course, NJUST 48

  49. Sequential Minimal Optimization Machine Learning Course, NJUST 49

  50. Coordinate Ascent • Consider a unconstrained optimization problem • Coordinate Ascent Algorithm Machine Learning Course, NJUST 50

  51. Coordinate Ascent • An Example Machine Learning Course, NJUST 51

  52. Recall the Dual Problem in SVM • The Dual Optimization Problem • KKT Conditions Machine Learning Course, NJUST 52

  53. Coordinate Ascent in SVM • Choose two coordinates for optimization each time • Which two coordinates to be chosen? Machine Learning Course, NJUST 53

  54. The SMO Algorithm where Machine Learning Course, NJUST 54

  55. The SMO Algorithm • Variable Elimination using Equality Constraint • Optimization by letting the gradient equals zero Machine Learning Course, NJUST 55

  56. The SMO Updating • Make use of • We finally have where Machine Learning Course, NJUST 56

  57. Adding Inequality Constraints • Equality Constraints • Inequality Constraints Machine Learning Course, NJUST 57

  58. Final Updating of Two Multipliers • In case of • In case of • Final Updating Machine Learning Course, NJUST 58

  59. Heuristics to Choose Two Multipliers • First choose a Lagrange multiplier that violate the KKT condition (Osuna Theory) • Second choose a Lagrange multiplier that maximize |E1-E2| Machine Learning Course, NJUST 59

  60. Updating of the Bias • Choose b that makes the KKT conditions hold (when alpha is not at the bounds) • Updating of b Machine Learning Course, NJUST 60

  61. Convergence Condition • Updating of the weights in case of a linear kernel • The problem has been solved, when all the Lagrange multipliers satisfy the KKT conditions (within a user-defined tolerance). Machine Learning Course, NJUST 61

  62. Questions? Machine Learning Course, NJUST

Recommend


More recommend