training linear svms
play

Training Linear SVMs By - Thorsten Joachims Prasad Seemakurthi - PowerPoint PPT Presentation

Training Linear SVMs By - Thorsten Joachims Prasad Seemakurthi Agenda What is SVM Kernel Hard Margins Soft Margins Linear Algorithm Few Examples Conclusion SVM Curtain Rais iser Linear Classification


  1. Training Linear SVMs’ By - Thorsten Joachims Prasad Seemakurthi

  2. Agenda • What is SVM • Kernel • Hard Margins • Soft Margins • Linear Algorithm • Few Examples • Conclusion

  3. SVM – Curtain Rais iser • Linear Classification Algorithm • SVM have a clever way to prevent over-fitting • SVMs have a very clever way to use huge number of features nearly as much as computation as seems to be necessary

  4. Lin inear Cla lassifiers (I (Intuition) Y (est)

  5. Lin inear Cla lassifiers Y (est) denotes +1 Any of these denotes -1 would be fine … But which is best … ?

  6. Linear Classifier

  7. Maximum Margin 1. Maximizing margin is good according to intuition and PAC theory 2. Implies only support vectors denotes +1 are important denotes -1 3. Empirically works well Classifier with the maximum margin Support This kind of simplest kind of Vectors SVM is called Linear SVM

  8. Maximizing th the margin .

  9. Why maximize th the margin? Points near decision surface -----> Uncertain classification decision (50% either way) A classifier with a large margin make no low classification decision Gives classification safety margin w.r.t slight errors in measurement

  10. Why maximize th the margin? • SVM Classifier : Large Margin around Decision boundary • Compare to decision hyperplane: Place a fat separator between classes • Fewer choices of where it can be put • Decreased memory capacity • Increased ability to correctly generalize the test data

  11. Lin inear SVM math themati tically

  12. Lin inear SVM math themati tically

  13. Lin inear (h (hard – Margin ) ) SVM – fo formulation

  14. Solv lving th the Opti timization Problem • Find w and b such that 1 •  (w) = 2 . 𝑥 𝑢 . 𝑥 is minimized For all {(x i , y i )}: y i 𝑥 𝑈 + 𝑦𝑗 + 𝑐 ≥ 1 The solution involves construction a dual problem where a Lagrange multiplier  I is associated with every constraint in the primary problem:

  15. Data taset t wit ith noise Problem ?

  16. Soft ft Margin Cla lassification • Slack variables can be added to allow misclassification of difficult or noisy data What should be our quadratic optimization criterion be ? Minimize 𝑆 1 2 ∗ 𝑥 𝑈 ∗ 𝑥 + 𝐷 𝜁 𝐿=1

  17. Hard vs. . Soft ft Margin SVM • Hard- margin doesn’t require to guess the cost parameter (requires no parameters at all) • Soft-margin also always has a solution • Soft – margin is more robust to outliers Smoother surfaces (in non – liner cases)

  18. Alg lgorith thm

  19. SVM Applications • SVM has been used successfully in many real world applications • Text ( and hypertext ) categorization • Image classification • Bioinformatics (protein classification, cancer classification) • Hand-written char. classification

Recommend


More recommend