Retaining the Support Vectors Why Retain the Support Vectors? x 2 i - PowerPoint PPT Presentation

Review: Support Vector Machines (SVMs) Start with labeled data-set: 1. { ( x 1 , y 1 ) , ( x 2 , y 2 ) , . . . , ( x n , y n ) } [ ∀ i, y i ∈ { +1 , − 1 } ] Solve constrained quadratic optimization problem: 2. α i − 1 X X Maximize: W ( α ) = α i α j y i y j ( x i · x j ) 2 i i,j while satisfying ∀ i, α i ≥ 0 Class #11: constraints: X α i y i = 0 Kernel Functions & SVMs, II i Derive necessary weights and biases for decision separator 3. Machine Learning (COMP 135): M. Allen, 09 Oct. 19 when and if needed: b = − 1 X w = 2( max i | y i = − 1 w · x i + j | y j =+1 w · x j ) min α i y i x i i 2 Wednesday, 9 Oct. 2019 Machine Learning (COMP 135) Retaining the Support Vectors Why Retain the Support Vectors? x 2 𝛽 i = 0 } After computing the various optimizing 𝛽 values, the SVM Large typically ends up with: Sometimes, retaining only the amount of A large number of data points x i with 𝛽 i = 0 data support vectors comes in handy 1. if we ever want to update the A few special data points x j with 𝛽 j ≠ 0 2. decision boundary as new data comes in for classification. } These special points, the support vectors, can be used by themselves to compute necessary weights and biases Large } Often, the SVM keeps a list of these vectors, for computation amount of 𝛽 j ≠ 0 data of later classification functions, rather than the weights defining the classification boundary directly x 1 𝛽 i = 0 } The 𝛽 i values are 0 everywhere except at the support vectors (the points closest to the separator) 3 4 Wednesday, 9 Oct. 2019 Machine Learning (COMP 135) Wednesday, 9 Oct. 2019 Machine Learning (COMP 135) 1

<latexit sha1_base64="W/ArN7Cg6WsO2G9XBhHqwZNG8U4=">ACYHicZVFNT+MwEHUCuxTYXVK4wcUCrdRqyrpHhYhIVw4QiIAhLuVo7jtFYdOziTihLlX/C7uHOEX4L7gUTpSJae37wZzyHqRQZ+P6L46sfvu+Vlnf2Pzx89eWV92+znRuGO8wLbW5DWnGpVC8AwIkv0Np0ko+U04PJ3kb0bcZEKrKxinvJvQvhKxYBQs1fOKY0kFAZhXDyUDdLAH7fHso6PMRlRkw7EJ0dExZpWMpM9OT+PqcRrs1zR/NmYXFZ/leYgP5MJPWed+A3/WngZRDMwUF7n/x5emPz3veM4k0yxOugEmaZXeBn0K3oAYEk7zcIHnGU8qGtM+LqTUl/m2pCMfa2KMAT9kFndIwtWKh+i6H+LBbCJXmwBWbtYlzie0KExdxJAxnIMcWUGaEfR+zATWUgfV6oZPJY8aeDT5oMjOKva6gdJy85rDQi+rsMrlvN4G+zdWGdOEGzqKA9tI9qKED/UBudoXPUQy9OmuO51SdN7fibrnVmdR15jU7aCHc3XesRLlj</latexit> Why Retain the Support Vectors? Why Retain the Support Vectors? x 2 𝛽 i = 0 x 2 𝛽 i = 0 Large Large In such scenarios, the data for If the new data remains close to amount of amount of which 𝛽 = 0 before remains that the old boundary, then we can data data way, and never needs to be compute the new 𝛽 - values using reconsidered when solving the only the new data and the (compute-intensive) optimization previous support vectors step of the SVM Large Large 𝛽 j ≠ 0 Only three points have their amount of amount of 𝛽 j ≠ 0 ?? data data 𝛽 - values re-computed x 1 x 1 𝛽 i = 0 𝛽 i = 0 5 6 Wednesday, 9 Oct. 2019 Machine Learning (COMP 135) Wednesday, 9 Oct. 2019 Machine Learning (COMP 135) Why Retain the Support Vectors? Pros and Cons of SVMs } [+] Compared to linear classifiers like logistic regression, SVMs: } Another reason to retain vectors rather than weights is that SVMs are often used with kernel functions that: Are insensitive to outliers in the data (extreme class examples) 1. Transform the data Give a robust boundary for separable classes 1. 2. Compute necessary dot-products of points Can handle high-dimensional data, via transformation 2. 3. ( ϕ : R n → R m ) Can find optimal 𝛽 - values, with no local maxima k ( x , z ) = ϕ ( x ) · ϕ ( z ) 4. } Furthermore, there are some popular such functions where } [–] Compared to linear classifiers like logistic regression, SVMs: the data transform translates n -dimensional to m -dimensional Are less applicable in multi-class ( c > 2 ) instances 1. data with n << m Require more complex tuning, via hyper-parameter selection 2. } In such cases, storing the original n -dimensional data, and then May require some deep thinking or experimentation in order to 3. computing the transformation when necessary, can be much more select the appropriate kernel functions efficient than trying to store the m -dimensional weight information } This is especially true in cases where m = ∞ (!!) 7 8 Wednesday, 9 Oct. 2019 Machine Learning (COMP 135) Wednesday, 9 Oct. 2019 Machine Learning (COMP 135) 2

<latexit sha1_base64="xH/XpF2Y5lr410kQlrN8GKYCiU=">ACMXicZVDLThsxFPVQSiE8GmDJxgJVAqlEM+mibBCo3SDBgkoEkJgw8ng8iYXHuxrBIz8A/0YFvQ/uoYdsOUn6kzCIuVKto6Pz732OWkpuIEwfAwmPkx+nPo0PdOYnZtf+NxcXDo2ymrKOlQJpU9TYpjgknWAg2CnpWakSAU7S9+Du5Prpg2XMkjuClZtyA9yXNOCXgqaR7ElOisGu0FgX6aV9cOb+K3w61zDm/j2FxqGJji6Ti25E7r6TD69cJ9LbhG+ct10ja6FrbAu/B5EI7C2u2+3dmbvfh8mzb9xpqgtmAQqiDFnUVhCtyIaOBXMNWJrWEnoBemxqvbq8BdPZThX2i8JuGbHdFJB7W2s+8xCvtWtuCwtMEmHY3IrMCg8iAVnXDMK4sYDQjX372PaJ5pQ8OGNTdJWsOwrvhoknvm/ip7y+n7RZnUA0f9234Pjdiv61mr/8kn8QMOaRitoFa2jCH1Hu2gPHaIOougePaBn9BL8CR6Dp+BlKJ0IRj3LaKyC13/d3azs</latexit> Gaussian Radial Basis Function (RBF) Gaussian Radial Basis Function k ( x , z ) = e − || x − z || 2 k ( x , z ) = e − || x − z || 2 2 σ 2 2 σ 2 } The RBF is based on a distance from a central focal point, z } The can be measured in a variety of ways, but is often Euclidean: } A popular kernel v with many uses is z n u X u || x − z || = ( x i − z i ) 2 the Gaussian RBF t i =1 Image source: https://www.cs.toronto.edu/~duvenaud/cookbook/ Image source: https://www.cs.toronto.edu/~duvenaud/cookbook/ 9 10 Wednesday, 9 Oct. 2019 Machine Learning (COMP 135) Wednesday, 9 Oct. 2019 Machine Learning (COMP 135) Gaussian Radial Basis Function Gaussian Radial Basis Function k ( x , z ) = e − || x − z || 2 k ( x , z ) = e − || x − z || 2 2 σ 2 2 σ 2 || x − z || = 0 || x − z || → ∞ k ( x , z ) = e 0 = 1 k ( x , z ) → e −∞ = 0 } The value of the } The value drops function is highest to 0 as we get at point z itself further from z Image source: https://www.cs.toronto.edu/~duvenaud/cookbook/ Image source: https://www.cs.toronto.edu/~duvenaud/cookbook/ 11 12 Wednesday, 9 Oct. 2019 Machine Learning (COMP 135) Wednesday, 9 Oct. 2019 Machine Learning (COMP 135) 3

Gaussian Radial Basis Function Gaussian Radial Basis Function k ( x , z ) = e − || x − z || 2 k ( x , z ) = e − || x − z || 2 2 σ 2 2 σ 2 σ → ∞ Tuning Parameter k ( x , z ) → e 0 = 1 } 𝜏 controls the } If 𝜏 gets larger, the diameter of the non-0 area will non-zero area become wider Image source: https://www.cs.toronto.edu/~duvenaud/cookbook/ Image source: https://www.cs.toronto.edu/~duvenaud/cookbook/ 13 14 Wednesday, 9 Oct. 2019 Machine Learning (COMP 135) Wednesday, 9 Oct. 2019 Machine Learning (COMP 135) Gaussian Radial Basis Function Gaussian Radial Basis Function k ( x , z ) = e − || x − z || 2 k ( x , z ) = e − || x − z || 2 x 2 2 σ 2 2 σ 2 σ → 0 k ( x , z ) → e −∞ = 0 x 1 } If 𝜏 gets smaller, non-0 area will } The radius around the focal point z at which the function becomes 0 corresponds to the decision boundary in our data become narrower Image source: https://www.cs.toronto.edu/~duvenaud/cookbook/ 15 16 Wednesday, 9 Oct. 2019 Machine Learning (COMP 135) Wednesday, 9 Oct. 2019 Machine Learning (COMP 135) 4

Gaussian Radial Basis Function Next Week x 2 3 } Topics : SVMs and Feature Engineering e − || x − z j || 2 X k ( x , z 1 , z 2 , z 3 ) = 2 σ 2 } Meetings : Tuesday and Wednesday, usual time j =1 } Readings : Linked from class website schedule page } Includes original paper (Brown, et al.) for discussion } Homework 03 : due Wednesday, 16 October, 9:00 AM } Project 01 : out Tuesday; due Monday, 04 November, 9:00 AM x 1 } Office Hours : 237 Halligan, Tuesday, 11:00 AM – 1:00 PM } We can deal with multiple clusters in the data by using a } TA hours can be found on class website as well combination of multiple RBFs 17 18 Wednesday, 9 Oct. 2019 Machine Learning (COMP 135) Wednesday, 9 Oct. 2019 Machine Learning (COMP 135) 5

Retaining the Support Vectors Why Retain the Support Vectors? x 2 i - PowerPoint PPT Presentation

Review: Support Vector Machines (SVMs) Start with labeled data-set: 1. { ( x 1 , y 1 ) , ( x 2 , y 2 ) , . . . , ( x n , y n ) } [ i, y i { +1 , 1 } ] Solve constrained quadratic optimization problem: 2. i 1 X X Maximize: W

Vectors Vectors and Scalars Properties of Vectors Components of a Vector and Unit

Orthonormal bases of functions April 24, 2018 Data - Vectors or Functions Vectors Functions

Vector'Semantics Dense%Vectors% Dan%Jurafsky Sparse'versus'dense'vectors PPMI%vectors%are

Methods of Adding Vectors Geometrically MCV4U: Calculus & Vectors Recall that two vectors are

RECRUITING RECRUITING AND RETAINING AND RETAINING VENDORS AT YOUR FARMERS VENDORS AT YOUR

Recruiting, retaining and engaging older Recruiting, retaining and engaging older youth are

Geometric Vectors A geometric vector is a representation of a vector using an arrow diagram, or

Vectors and Semantics Peter Turney Vectors and Semantics Vision of the Future future of

JUST THE MATHS SLIDES NUMBER 8.2 VECTORS 2 (Vectors in component form) by A.J.Hobson

Multiplying a Vector By a Scalar MCV4U: Calculus & Vectors Compare the two vectors, u and

VIDEO SIGNALS VIDEO SIGNALS Corners and Shapes PROJECTION OF VECTORS PROJECTION OF VECTORS

Geometric Vectors A geometric vector is a representation of a vector using an arrow diagram, or

Vectors The standard geometric definition of vector is as something which has direction and

Vectors MA1S1 Tristan McLoughlin tristan@maths.tcd.ie Vectors Some quantities (which we will

Recruiting and Retaining Dedicated Volunteers Stephanie El-Hajj Session Outline 1. Recruiting

Millennials Recruiting, Training, and Retaining the Future October 25, 2017 Kyle Mote

Comp115: Databases Hash Indexing Instructor: Manos Athanassoulis Comp115

The String Multiverse, The Cosmological Anthropic Principle, & Anselms Ontological

Some Issues in RPV SUSY Matt Strassler Special thanks to Jared Evans, Yevgeny Kats

APNA 30th Annual Conference Session 2022: October 20, 2016 I NTERPROFESSIONAL E DUCATION IN P

Using prior knowledge in dynamic settings for multivariate Gaussian processes Dan Cornford

ISSUE WITH TRACKING? FAIL THAT BUILD! LAUNCH DAY SHIPPED IT! GO TEAM! DID WE EVER FINISH the

KAPPA, LAMBDA & MY JOURNEY FROM LEGACY TO NEW MICHAEL VAN DER HAVEN OUTLINE This is not

9/22/20 Opioid Moderatism and Rapprochement: The Search for a Sane Middle Ground Michael E.

Retaining the Support Vectors Why Retain the Support Vectors? x 2 i - PowerPoint PPT Presentation

Review: Support Vector Machines (SVMs) Start with labeled data-set: 1. { ( x 1 , y 1 ) , ( x 2 , y 2 ) , . . . , ( x n , y n ) } [ i, y i { +1 , 1 } ] Solve constrained quadratic optimization problem: 2. i 1 X X Maximize: W

Vectors Vectors and Scalars Properties of Vectors Components of a Vector and Unit

Orthonormal bases of functions April 24, 2018 Data - Vectors or Functions Vectors Functions

Vector'Semantics Dense%Vectors% Dan%Jurafsky Sparse'versus'dense'vectors PPMI%vectors%are

Methods of Adding Vectors Geometrically MCV4U: Calculus &amp; Vectors Recall that two vectors are

RECRUITING RECRUITING AND RETAINING AND RETAINING VENDORS AT YOUR FARMERS VENDORS AT YOUR

Recruiting, retaining and engaging older Recruiting, retaining and engaging older youth are

Geometric Vectors A geometric vector is a representation of a vector using an arrow diagram, or

Vectors and Semantics Peter Turney Vectors and Semantics Vision of the Future future of

JUST THE MATHS SLIDES NUMBER 8.2 VECTORS 2 (Vectors in component form) by A.J.Hobson

Multiplying a Vector By a Scalar MCV4U: Calculus &amp; Vectors Compare the two vectors, u and

VIDEO SIGNALS VIDEO SIGNALS Corners and Shapes PROJECTION OF VECTORS PROJECTION OF VECTORS

Geometric Vectors A geometric vector is a representation of a vector using an arrow diagram, or

Vectors The standard geometric definition of vector is as something which has direction and

Vectors MA1S1 Tristan McLoughlin tristan@maths.tcd.ie Vectors Some quantities (which we will

Recruiting and Retaining Dedicated Volunteers Stephanie El-Hajj Session Outline 1. Recruiting

Millennials Recruiting, Training, and Retaining the Future October 25, 2017 Kyle Mote

Comp115: Databases Hash Indexing Instructor: Manos Athanassoulis Comp115

The String Multiverse, The Cosmological Anthropic Principle, &amp; Anselms Ontological

Some Issues in RPV SUSY Matt Strassler Special thanks to Jared Evans, Yevgeny Kats

APNA 30th Annual Conference Session 2022: October 20, 2016 I NTERPROFESSIONAL E DUCATION IN P

Using prior knowledge in dynamic settings for multivariate Gaussian processes Dan Cornford

ISSUE WITH TRACKING? FAIL THAT BUILD! LAUNCH DAY SHIPPED IT! GO TEAM! DID WE EVER FINISH the

KAPPA, LAMBDA &amp; MY JOURNEY FROM LEGACY TO NEW MICHAEL VAN DER HAVEN OUTLINE This is not

9/22/20 Opioid Moderatism and Rapprochement: The Search for a Sane Middle Ground Michael E.

Methods of Adding Vectors Geometrically MCV4U: Calculus & Vectors Recall that two vectors are

Multiplying a Vector By a Scalar MCV4U: Calculus & Vectors Compare the two vectors, u and

The String Multiverse, The Cosmological Anthropic Principle, & Anselms Ontological

KAPPA, LAMBDA & MY JOURNEY FROM LEGACY TO NEW MICHAEL VAN DER HAVEN OUTLINE This is not