Chapter 8. Support Vector Machines Wei Pan Division of - PowerPoint PPT Presentation

Chapter 8. Support Vector Machines Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 � Wei Pan c

Introduction ◮ SVM: § 4.5.2, 12.1-12.3; by Vapnik (1996). ◮ Training data: ( Y i , X i ), Y i = ± 1, i = 1 , ..., n . ◮ Fig 4.14: with two separable classes, many possible separating hyperplanes, e.g. , LSE (or LDA): 1 error; Perceptron: diff starting values; SVC: max the “separation” b/w two classes; Fig 4.16. Elements of Statistical Learning (2nd Ed.) � Hastie, Tibshirani & Friedman 2009 Chap 4 c FIGURE 4.14. A toy example with two classes separable by a hyperplane. The orange line is the least squares solution, which misclassifies one of the training points. Also shown are two blue separating hyperplanes found by the perceptron learning algorithm with different random starts.

Elements of Statistical Learning (2nd Ed.) � Hastie, Tibshirani & Friedman 2009 Chap 4 c FIGURE 4.14. A toy example with two classes separable by a hyperplane. The orange line is the least squares solution, which misclassifies one of the training points. Also shown are two blue separating hyperplanes found by the perceptron learning algorithm with different random starts.

Elements of Statistical Learning (2nd Ed.) � Hastie, Tibshirani & Friedman 2009 Chap 4 c The same data as in Figure 4.14. FIGURE 4.16. The shaded region delineates the maximum margin separating the two classes. There are three support points indicated, which lie on the boundary of the margin, and the optimal separating hyperplane (blue line) bisects the slab. Included in the figure is the boundary found using logistic regression (red line), which is very close to the optimal separating hyperplane (see Section 12.3.3).

Review ◮ Hyperplane L : f ( x ) = β 0 + β ′ x = 0. ⇒ β ′ ( x 1 − x 2 ) = 0. ◮ 1) Any x 1 , x 2 ∈ L = β ⊥ L ; β ∗ = β/ || β || , vector normal to L . ⇒ β 0 + β ′ x 0 = 0. ◮ 2) x 0 ∈ L = ◮ 3) The signed distance of any x to L is: β ∗′ ( x − x 0 ) = ( β ′ x − β ′ x 0 ) / || β || = ( β ′ x + β 0 ) / || β || . = ⇒ f ( x ) ∝ signed dist of x to L . Elements of Statistical Learning (2nd Ed.) � Hastie, Tibshirani & Friedman 2009 Chap 4 c x x 0 β 0 + β T x = 0 β ∗ FIGURE 4.15. The linear algebra of a hyperplane (affine set).

Elements of Statistical Learning (2nd Ed.) � Hastie, Tibshirani & Friedman 2009 Chap 4 c x x 0 β 0 + β T x = 0 β ∗ FIGURE 4.15. The linear algebra of a hyperplane (affine set).

Case I: two classes are separable ◮ WLOG, assume || β || = 1 in f ( x ) = β 0 + β ′ x . Classifier: G ( x ) = sign( f ( x )). ◮ Since the two claases are separable, 1) Exists a f ( x ) = β 0 + β ′ x = 0 s.t. Y i f ( X i ) > 0 for all i ; 2) Exists a f ( x ) = β 0 + β ′ x = 0 s.t. the margin is maximized; Fig 12.1. ◮ Optimization problem max β 0 ,β, || β || =1 M s.t. Y i ( β 0 + β ′ X i ) ≥ M for i = 1 , ..., n . Q: what is β 0 + β ′ X i ? ◮ Or, max β 0 ,β M s.t. Y i ( β 0 + β ′ X i ) / || β || ≥ M for i = 1 , ..., n . ◮ Set || β || = 1 / M , then min β 0 ,β || β || or min β 0 ,β 1 2 || β || 2 s.t. Y i ( β 0 + β ′ X i ) ≥ 1 for i = 1 , ..., n .

Elements of Statistical Learning (2nd Ed.) � Hastie, Tibshirani & Friedman 2009 Chap 12 c x T β + β 0 = 0 x T β + β 0 = 0 • • • • • • • • ξ ∗ ξ ∗ ξ ∗ • ξ ∗ • • • • 4 4 4 5 • • • • 1 M = ξ ∗ ξ ∗ � β � • ξ ∗ ξ ∗ ξ ∗ • • • • • 3 3 • M = 1 1 1 1 • • � β � • ξ ∗ ξ ∗ ξ ∗ • • • • margin 2 2 2 • • margin • • • • • • • • 1 M = � β � • • 1 M = � β � FIGURE 12.1. Support vector classifiers. The left panel shows the separable case. The decision boundary is the solid line, while broken lines bound the shaded maximal margin of width 2 M = 2 / � β � . The right panel shows the nonseparable (overlap) case. The points la- beled ξ ∗ j are on the wrong side of their margin by an amount ξ ∗ j = Mξ j ; points on the correct side have ξ ∗ j = 0 . The margin is maximized subject to a total budget P ξ i ≤ constant. Hence P ξ ∗ j is the total distance of points on the wrong side of their margin.

◮ Convex programming: a quadratic obj with linear inequality constraints. ◮ Rewritten as a Lagrange function, ... β is defined by some support points/vectors X i ’s. Fig 4.16: 3 SVs ◮ Remarks: 1) SVC: a large margin leads to better separation/prediction on test data!? ◮ 2) Robustness: β 0 and β determined only by SVs, but ...

Case II: non-separable ◮ Introduce some new variable ξ ’s: max β 0 ,β, || β || =1 M s.t. Y i ( β 0 + β ′ X i ) ≥ M (1 − ξ i ) and ξ i ≥ 0 and � n i =1 ξ i ≤ B for i = 1 , ..., n , ◮ Rewrite 2 || β || 2 + C � n min β 0 ,β 1 i =1 ξ i s.t. ξ i ≥ 0 and Y i ( β 0 + β ′ X i ) ≥ 1 − ξ i ∀ i , where C is the ”cost”, a tuning parameter. Fig 12.2 ◮ .......similar results as before (e.g. convex programming, SVs) (8000): Computing § 12.2.1.

Chapter 8. Support Vector Machines Wei Pan Division of - PowerPoint PPT Presentation

Chapter 8. Support Vector Machines Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 Wei Pan c Introduction SVM: 4.5.2, 12.1-12.3;

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

? 17.10.2018 3 17.10.2018 4 Support Vector Machines (SVM): Background Support Vector Machines

Support Vector Machines October 16, 2018 Support Vector Machines October 16, 2018 1 / 31

Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 Jukka Lankinen Relevance Vector

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Support Vector Machines & Kernelization Barna Saha Most of the slides are made using David

Introduction Kailash Awati Instructor DataCamp Support Vector Machines in R Preliminaries

Support Vector Machines Support Vector Machines Hypothesis Space Hypothesis Space variable

Support Vector Machines (Ch. 18.9) SVM Basics Support Vector Machines (SVMs) try to do our

Support vector machines CS 446 Part 1: linear support vector machines 1.0 1.0 1.0 0.8 0.8

SUPPORT VECTOR MACHINES SUPPORT VECTOR MACHINES Matthieu R Bloch Tuesday, February 25, 2020 1

RBF Kernels: Generating a complex dataset DataCamp Support Vector Machines in R A bit about RBF

Machine Learning for NLP Support Vector Machines Aurlie Herbelot 2019 Centre for Mind/Brain

Generating a radially separable dataset DataCamp Support Vector Machines in R Generating a 2d

Support Vector Machines 290N, 2014 Support Vector Machines (SVM) Supervised learning

I n t r o d u c t i o n t o H i g h - l e v e l S y n t h e s i s

Classification Problems From Regression to Classification x } Suppose we have two classes of

PREVENTING THE THREATS OF TOMORROW AND BEYOND Jonathan Kaftzan VP Product marketing &

Announcements: - Thank you for participating in our mid-quarter evaluation Thank you for

and 611 + for small : Integer factorization Sieving 1 612 2 2 3 3 D. J. Bernstein

Summary 1 Things you should know now: Basic ideas about databases and DBMSs What is a data

Database Design and Programming Peter Schneider-Kamp DM 505, Spring 2009, 3 rd Quarter 1 Course

All your GPS Trackers belong to Us 1 Who we are Pierre Barre, Lead Security Researcher,

Chapter 8. Support Vector Machines Wei Pan Division of - PowerPoint PPT Presentation

Chapter 8. Support Vector Machines Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 Wei Pan c Introduction SVM: 4.5.2, 12.1-12.3;

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

? 17.10.2018 3 17.10.2018 4 Support Vector Machines (SVM): Background Support Vector Machines

Support Vector Machines October 16, 2018 Support Vector Machines October 16, 2018 1 / 31

Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 Jukka Lankinen Relevance Vector

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Support Vector Machines &amp; Kernelization Barna Saha Most of the slides are made using David

Introduction Kailash Awati Instructor DataCamp Support Vector Machines in R Preliminaries

Support Vector Machines Support Vector Machines Hypothesis Space Hypothesis Space variable

Support Vector Machines (Ch. 18.9) SVM Basics Support Vector Machines (SVMs) try to do our

Support vector machines CS 446 Part 1: linear support vector machines 1.0 1.0 1.0 0.8 0.8

SUPPORT VECTOR MACHINES SUPPORT VECTOR MACHINES Matthieu R Bloch Tuesday, February 25, 2020 1

RBF Kernels: Generating a complex dataset DataCamp Support Vector Machines in R A bit about RBF

Machine Learning for NLP Support Vector Machines Aurlie Herbelot 2019 Centre for Mind/Brain

Generating a radially separable dataset DataCamp Support Vector Machines in R Generating a 2d

Support Vector Machines 290N, 2014 Support Vector Machines (SVM) Supervised learning

I n t r o d u c t i o n t o H i g h - l e v e l S y n t h e s i s

Classification Problems From Regression to Classification x } Suppose we have two classes of

PREVENTING THE THREATS OF TOMORROW AND BEYOND Jonathan Kaftzan VP Product marketing &amp;

Announcements: - Thank you for participating in our mid-quarter evaluation Thank you for

and 611 + for small : Integer factorization Sieving 1 612 2 2 3 3 D. J. Bernstein

Summary 1 Things you should know now: Basic ideas about databases and DBMSs What is a data

Database Design and Programming Peter Schneider-Kamp DM 505, Spring 2009, 3 rd Quarter 1 Course

All your GPS Trackers belong to Us 1 Who we are Pierre Barre, Lead Security Researcher,

Support Vector Machines & Kernelization Barna Saha Most of the slides are made using David

PREVENTING THE THREATS OF TOMORROW AND BEYOND Jonathan Kaftzan VP Product marketing &