Support Vector Machines (SVMs). Semi-Supervised Learning. - PowerPoint PPT Presentation

Support Vector Machines (SVMs). • Semi-Supervised Learning. • Semi-Supervised SVMs. • Maria-Florina Balcan 03/25/2015

Support Vector Machines (SVMs). One of the most theoretically well motivated and practically most e ff ective classification algorithms in machine learning. Directly motivated by Margins and Kernels!

Geometric Margin WLOG homogeneous linear separators [w 0 = 0] . Definition: The margin of example 𝑦 w.r.t. a linear sep. 𝑥 is the distance from 𝑦 to the plane 𝑥 ⋅ 𝑦 = 0 . Margin of example 𝑦 1 If 𝑥 = 1 , margin of x w.r.t. w is | 𝑦 ⋅ 𝑥| . 𝑦 1 w Margin of example 𝑦 2 𝑦 2

Geometric Margin Definition: The margin of example 𝑦 w.r.t. a linear sep. 𝑥 is the distance from 𝑦 to the plane 𝑥 ⋅ 𝑦 = 0 . Definition: The margin 𝛿 𝑥 of a set of examples 𝑇 wrt a linear separator 𝑥 is the smallest margin over points 𝑦 ∈ 𝑇 . Definition: The margin 𝛿 of a set of examples 𝑇 is the maximum 𝛿 𝑥 over all linear separators 𝑥 . w + + 𝛿 - 𝛿 + + - - - + - - - - -

Margin Important Theme in ML Both sample complexity and algorithmic implications. Sample/Mistake Bound complexity : • If large margin, # mistakes Peceptron makes is small (independent on the dim of the space)! If large margin 𝛿 and if alg. produces a large • w + + 𝛿 margin classifier, then amount of data needed - 𝛿 + depends only on R/𝛿 [Bartlett & Shawe- Taylor ’99 ] . + - - - + - - Algorithmic Implications - - - Suggests searching for a large margin classifier… SVMs

Support Vector Machines (SVMs) Directly optimize for the maximum margin separator: SVMs First, assume we know a lower bound on the margin 𝛿 w + + Input: 𝛿 , S={( x 1 , 𝑧 1 ) , …,( x m , 𝑧 m )}; 𝛿 - 𝛿 + Find: some w where: + - - - + - 2 = 1 - • w - • For all i, 𝑧 𝑗 𝑥 ⋅ 𝑦 𝑗 ≥ 𝛿 - - Output: w, a separator of margin 𝛿 over S Realizable case, where the data is linearly separable by margin 𝛿

Support Vector Machines (SVMs) Directly optimize for the maximum margin separator: SVMs E.g., search for the best possible 𝛿 w + + Input: S={( x 1 , 𝑧 1 ) , …,( x m , 𝑧 m )}; 𝛿 - 𝛿 + Find: some w and maximum 𝛿 where: + - - - + - 2 = 1 - • w - • For all i, 𝑧 𝑗 𝑥 ⋅ 𝑦 𝑗 ≥ 𝛿 - - Output: maximum margin separator over S

Support Vector Machines (SVMs) Directly optimize for the maximum margin separator: SVMs Input: S={( x 1 , 𝑧 1 ) , …,( x m , 𝑧 m )}; Maximize 𝛿 under the constraint: 2 = 1 w + + 𝛿 • w - 𝛿 + • For all i, 𝑧 𝑗 𝑥 ⋅ 𝑦 𝑗 ≥ 𝛿 + - - - + - - - - -

Support Vector Machines (SVMs) Directly optimize for the maximum margin separator: SVMs This is a Input: S={( x 1 , 𝑧 1 ) , …,( x m , 𝑧 m )}; constrained Maximize 𝛿 under the constraint: optimization 2 = 1 problem. • w • For all i, 𝑧 𝑗 𝑥 ⋅ 𝑦 𝑗 ≥ 𝛿 objective constraints function • Famous example of constrained optimization: linear programming , where objective fn is linear, constraints are linear (in)equalities

Support Vector Machines (SVMs) Directly optimize for the maximum margin separator: SVMs w + + 𝛿 Input: S={( x 1 , 𝑧 1 ) , …,( x m , 𝑧 m )}; - 𝛿 + Maximize 𝛿 under the constraint: + - - - + 2 = 1 - • w - - • For all i, 𝑧 𝑗 𝑥 ⋅ 𝑦 𝑗 ≥ 𝛿 - - 𝑥 1 This constraint is non-linear. 𝑥 1 + 𝑥 2 2 In fact, it’s even non -convex 𝑥 2

Support Vector Machines (SVMs) Directly optimize for the maximum margin separator: SVMs w + + 𝛿 Input: S={( x 1 , 𝑧 1 ) , …,( x m , 𝑧 m )}; - 𝛿 + Maximize 𝛿 under the constraint: + - - - + 2 = 1 - • w - - • For all i, 𝑧 𝑗 𝑥 ⋅ 𝑦 𝑗 ≥ 𝛿 - - 𝑥’ = 𝑥/𝛿 , then max 𝛿 is equiv. to minimizing ||𝑥’|| 2 (since ||𝑥’|| 2 = 1/𝛿 2 ). So, dividing both sides by 𝛿 and writing in terms of w’ we get : w’ 𝑥’ ⋅ 𝑦 = −1 + + Input: S={( x 1 , 𝑧 1 ) , …,( x m , 𝑧 m )}; - + + 2 under the constraint: - - Minimize 𝑥′ - + - - • For all i, 𝑧 𝑗 𝑥′ ⋅ 𝑦 𝑗 ≥ 1 - - 𝑥’ ⋅ 𝑦 = 1 -

Support Vector Machines (SVMs) Directly optimize for the maximum margin separator: SVMs Input: S={( x 1 , 𝑧 1 ) , …,( x m , 𝑧 m )}; This is a constrained 2 s.t.: argmin w 𝑥 optimization problem. • For all i, 𝑧 𝑗 𝑥 ⋅ 𝑦 𝑗 ≥ 1 • The objective is convex (quadratic) • All constraints are linear • Can solve efficiently (in poly time) using standard quadratic programing (QP) software

Support Vector Machines (SVMs) Question: what if data isn’t perfectly linearly separable? - Issue 1: now have two objectives 𝑥 ⋅ 𝑦 = −1 w + + maximize margin • - + minimize # of misclassifications. • - - - + - - Ans 1: Let’s optimize their sum: minimize + 𝑥 ⋅ 𝑦 = 1 - - 2 + 𝐷 (# misclassifications) 𝑥 where 𝐷 is some tradeoff constant. Issue 2: This is computationally hard (NP-hard). [even if didn’t care about margin and minimized # mistakes] NP-hard [Guruswami-Raghavendra ’06]

Support Vector Machines (SVMs) Question: what if data isn’t perfectly linearly separable? R eplace “# mistakes” with upper bound called “hinge loss” w’ 𝑥’ ⋅ 𝑦 = −1 + + Input: S={( x 1 , 𝑧 1 ) , …,( x m , 𝑧 m )}; - + + 2 under the constraint: - - Minimize 𝑥′ - + - - • For all i, 𝑧 𝑗 𝑥′ ⋅ 𝑦 𝑗 ≥ 1 - - 𝑥’ ⋅ 𝑦 = 1 - - Input: S={( x 1 , 𝑧 1 ) , …,( x m , 𝑧 m )}; 𝑥 ⋅ 𝑦 = −1 w + 2 + 𝐷 𝜊 𝑗 + Find s.t.: argmin w,𝜊 1 ,…,𝜊 𝑛 𝑥 - + 𝑗 - - • - For all i, 𝑧 𝑗 𝑥 ⋅ 𝑦 𝑗 ≥ 1 − 𝜊 𝑗 + - - 𝜊 𝑗 ≥ 0 + 𝑥 ⋅ 𝑦 = 1 - 𝜊 𝑗 are “slack variables” -

Support Vector Machines (SVMs) Question: what if data isn’t perfectly linearly separable? R eplace “# mistakes” with upper bound called “hinge loss” - 𝑥 ⋅ 𝑦 = −1 w + Input: S={( x 1 , 𝑧 1 ) , …,( x m , 𝑧 m )}; + - + 2 + 𝐷 𝜊 𝑗 Find s.t.: - - argmin w,𝜊 1 ,…,𝜊 𝑛 𝑥 - 𝑗 + • - - For all i, 𝑧 𝑗 𝑥 ⋅ 𝑦 𝑗 ≥ 1 − 𝜊 𝑗 + 𝑥 ⋅ 𝑦 = 1 𝜊 𝑗 ≥ 0 - - 𝜊 𝑗 are “slack variables” C controls the relative weighting between the 2 small (margin is twin goals of making the 𝑥 large) and ensuring that most examples have functional margin ≥ 1 . (0,1 − 𝑧 𝑥 ⋅ 𝑦) 𝑚 𝑥, 𝑦, 𝑧 = max

Support Vector Machines (SVMs) Question: what if data isn’t perfectly linearly separable? R eplace “# mistakes” with upper bound called “hinge loss” - 𝑥 ⋅ 𝑦 = −1 w + Input: S={( x 1 , 𝑧 1 ) , …,( x m , 𝑧 m )}; + - + 2 + 𝐷 𝜊 𝑗 Find s.t.: - - argmin w,𝜊 1 ,…,𝜊 𝑛 𝑥 - 𝑗 + • - - For all i, 𝑧 𝑗 𝑥 ⋅ 𝑦 𝑗 ≥ 1 − 𝜊 𝑗 + 𝑥 ⋅ 𝑦 = 1 𝜊 𝑗 ≥ 0 - - Replace the number of mistakes with the hinge loss 2 + 𝐷 (# misclassifications) 𝑥 (0,1 − 𝑧 𝑥 ⋅ 𝑦) 𝑚 𝑥, 𝑦, 𝑧 = max

Support Vector Machines (SVMs) Question: what if data isn’t perfectly linearly separable? R eplace “# mistakes” with upper bound called “hinge loss” - 𝑥 ⋅ 𝑦 = −1 w + Input: S={( x 1 , 𝑧 1 ) , …,( x m , 𝑧 m )}; + - + 2 + 𝐷 𝜊 𝑗 Find s.t.: - - argmin w,𝜊 1 ,…,𝜊 𝑛 𝑥 - 𝑗 + • - - For all i, 𝑧 𝑗 𝑥 ⋅ 𝑦 𝑗 ≥ 1 − 𝜊 𝑗 + 𝑥 ⋅ 𝑦 = 1 𝜊 𝑗 ≥ 0 - - Total amount have to move the points to get them on the correct side of the lines 𝑥 ⋅ 𝑦 = +1/−1 , where the distance between the lines 𝑥 ⋅ 𝑦 = 0 and 𝑥 ⋅ 𝑦 = 1 counts as “1 unit”. (0,1 − 𝑧 𝑥 ⋅ 𝑦) 𝑚 𝑥, 𝑦, 𝑧 = max

What if the data is far from being linearly separable? No good linear vs Example: separator in pixel representation. SVM philosophy: “ Use Kernel ” e a Ke

Support Vector Machines (SVMs) Input: S={( x 1 , 𝑧 1 ) , …,( x m , 𝑧 m )}; Primal 2 + 𝐷 𝜊 𝑗 Find s.t.: argmin w,𝜊 1 ,…,𝜊 𝑛 𝑥 form 𝑗 • For all i, 𝑧 𝑗 𝑥 ⋅ 𝑦 𝑗 ≥ 1 − 𝜊 𝑗 𝜊 𝑗 ≥ 0 Which is equivalent to: Input: S={( x 1 , y 1 ) , …,( x m , y m )}; Lagrangian Dual 1 s.t.: Find 2 y i y j α i α j x i ⋅ x j − α i argmin α i j i • For all i, 0 ≤ α i ≤ C i y i α i = 0 i

SVMs (Lagrangian Dual) Input: S={( x 1 , y 1 ) , …,( x m , y m )}; 1 s.t.: Find 2 y i y j α i α j x i ⋅ x j − α i argmin α i j i • For all i, 0 ≤ α i ≤ C i + - y i α i = 0 𝑥 ⋅ 𝑦 = −1 w i + - + - - - • Final classifier is: w = α i y i x i + i - - • The points x i for which α i ≠ 0 + 𝑥 ⋅ 𝑦 = 1 - are called the “support vectors” -

Support Vector Machines (SVMs). Semi-Supervised Learning. - PowerPoint PPT Presentation

Support Vector Machines (SVMs). Semi-Supervised Learning. Semi-Supervised SVMs. Maria-Florina Balcan 03/25/2015 Support Vector Machines (SVMs). One of the most theoretically well motivated and practically most e ff ective

Multiclass Classification using SVMs on GPUs Sergio Herrero 6.338J Applied Parallel Computing

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Machine Learning for NLP Support Vector Machines Aurlie Herbelot 2019 Centre for Mind/Brain

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

Support Vector Machines (Ch. 18.9) SVM Basics Support Vector Machines (SVMs) try to do our

? 17.10.2018 3 17.10.2018 4 Support Vector Machines (SVM): Background Support Vector Machines

Support Vector Machines 290N, 2014 Support Vector Machines (SVM) Supervised learning

Margin-based Semi-supervised Learning Using Apollonius circle MONA EMADI AND JAFAR TANHA T TC S

Introduction Kailash Awati Instructor DataCamp Support Vector Machines in R Preliminaries

Support Vector Machines Support Vector Machines CSC 411 Tutorial April 1, 2015 Tutor: Shenlong

Support Vector Machines This set of notes presents the Support Vector Machine (SVM) learning al-

Semi-Supervised Learning Maria-Florina Balcan 03/30/2015 Readings: Semi-Supervised Learning.

Support Vector Machines October 16, 2018 Support Vector Machines October 16, 2018 1 / 31

Convex Programs COMPSCI 371D Machine Learning COMPSCI 371D Machine Learning Convex

Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 Jukka Lankinen Relevance Vector

Gender Classification with Support vector machines (SVMs) Support Vector Machines The 3

Now what do I do with this function? Enrique Pinzn StataCorp LP October 19, 2017 Madrid

Section 3.7 Derivatives of logarithmic functions 1 Rules of exponentials and logarithms 1.

Nummatus: A Privacy Preserving Proof of Reserves Protocol for Quisquis Arijit Dutta , Arnab Jana,

Holography for inflation using conformal perturbation theory Paul McFadden Perimeter Institute

UW Insider Trusted college advice, for students, by students Jordan Heier and Isaiah Mathieu

Second Quarter Fiscal 2014 Financial Results Quarter Ended March 29, 2014 1 Safe Harbor

The New Child Care Package Provider Information Session March 2018 Topics covered today New

Intro to FE funding and the ILR Agenda 13:30 The basics of EF A per learner and S F A (incl.