MAXIMUM MARGIN CLASSIFIERS MAXIMUM MARGIN CLASSIFIERS Matthieu R - PowerPoint PPT Presentation

MAXIMUM MARGIN CLASSIFIERS MAXIMUM MARGIN CLASSIFIERS Matthieu R Bloch Tuesday, February 11, 2020 1

LOGISTICS LOGISTICS TAs and Office hours Tuesday: TJ (VL C449 Cubicle D) - 1:30pm - 2:45pm Wednesday: Matthieu (TSRB 423) - 12:00:pm-1:15pm Thursday: Hossein (VL C449 Cubicle B): 10:45pm - 12:00pm Friday: Brighton (TSRB 523a) - 12pm-1:15pm Homework 3 Due Wednesday February 19, 2020 11:59pm EST (Friday February 126, 2020 for DL) Please include separate PDF with your plots and listings Make sure you show your work, don’t leave gaps in loggic Honor code Cite your sources Refrain from using solutions from previous years 2

RECAP: MAXIMUM MARGIN HYPERPLANE RECAP: MAXIMUM MARGIN HYPERPLANE “All separating hyperplanes are equal but some are more equal than others” Margin w ⊺ x i | + b | ρ ( w , b ) ≜ min i ∥ w ∥ 2 The maximum margin hyperplane is the solution of w ∗ b ∗ ( , ) = argmax ρ ( w , b ) w , b Larger margin leads to better generalization Definition. The canonical form of a separating plane is such that ( w , b ) y i w ⊺ x i i ∗ y i ∗ w ⊺ x i ∗ ∀ i ( + b ) ≥ 1 and ∃ s.t. ( + b ) = 1 For canonical hyperplanes, the optimization problem is 1 2 ∥ w ∥ 2 y i w ⊺ x i argmin s.t. ∀ i ( + b ) ≥ 1 2 w , b this is a constrained quadratic program we know how to solve this really well Will come back when we talk about support vector machines 3

OPTIMAL SOFT-MARGIN HYPERPLANE OPTIMAL SOFT-MARGIN HYPERPLANE What if our data is not linearly separable? The constraint cannot be satisfied y i w ⊺ x i ∀ i ( + b ) ≥ 1 Introduce slack variables such that y i w ⊺ x i ξ i > 0 ∀ i ( + b ) ≥ 1 − ξ i The optimal so�-margin hyperplane is the solution of the following N 1 C 2 ∥ w ∥ 2 y i w ⊺ x i argmin + n ∑ ξ i s.t. ∀ i ( + b ) ≥ 1 − ξ i and ξ i ≥ 0 2 w , b , ξ i =1 is a cost set by the user, which controls the influence of outliers C > 0 6

NON LINEAR FEATURES NON LINEAR FEATURES LDA, logistic, PLA, are all linear classifiers : classification region boundaries are hyperplanes Some datasets are not linearly separable! We can create nonlinear classifiers by transforming the data through a non linear map R d R p Φ : → ϕ 1 ( x ) ⎡ ⎤ x 1 ⎡ ⎤ ⎢ ⎥ ⋮ ⎢ ⎥ ⎢ ⎥ Φ : → ⎢ ⎥ ⎢ ⎥ ⋮ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ ⋮ x d ⎣ ⎦ ϕ p ( x ) One can then apply linear methods on the transformed feature vector Φ( x ) Example. Ring data Challenges : if this gets computationally challenging and there is a risk of overfitting! p ≫ n 9

KERNEL METHODS - AN OBSERVATION KERNEL METHODS - AN OBSERVATION Consider the maximum margin hyperplane with non linear transform R d R p Φ : → 1 2 ∥ w ∥ 2 y i w ⊺ argmin s.t. ∀ i ( Φ( x i ) + b ) ≥ 1 2 w , b One can show (later) that the optimal is of the form ∑ N w w = i =1 α i Φ( x i ) N N N N ∥ w ∥ 2 w ⊺ x i ) ⊺ = w = ∑ ∑ α i α j Φ( Φ( x j ) = ∑ ∑ α i α j ⟨ Φ( x i ), Φ( x j ) ⟩ 2 i =1 j =1 i =1 j =1 N N w ⊺ x i ) ⊺ Φ( x j ) = ∑ α i Φ( Φ( x j ) = ∑ α i ⟨ Φ( x i ), Φ( x j ) ⟩ i =1 i =1 The only quantities we really care about are the dot products ⟨ Φ( x i ), Φ( x j ) ⟩ There are only of them N 2 The dimension of does not appear explicitly (hidden in dot product), we only work in R d Φ( x ) The nonlinear features may not be computed explicitly in ⟨ Φ( x i ), Φ( x j ) ⟩ 13

KERNEL METHODS - THE TRICK KERNEL METHODS - THE TRICK Implicitly define features through the choice of a kernel Definition. (Inner product kernel) An inner product kernel is a mapping for which there exists a Hilbert space and a R d R d k : × → R H mapping such that R d Φ : → H R d ∀ u , v ∈ k ( u , v ) = ⟨ Φ( u ), Φ( v ) ⟩ H Example. Quadratic kernel u ⊺ ) 2 k ( u , v ) = ( v Definition. (Positive semidefinite kernel) A function is a positive semidefinite kernel if R d R d k : × → R is symmetric, i.e., k k ( u , v ) = k ( v , u ) for all , the Gram matrix is positive semidefinite, i.e., { x i } N K i =1 x ⊺ Kx ≥ 0 with K = [ K i , j ] and K i , j ≜ k ( x i x j , ) 14

   15

MAXIMUM MARGIN CLASSIFIERS MAXIMUM MARGIN CLASSIFIERS Matthieu R - PowerPoint PPT Presentation

MAXIMUM MARGIN CLASSIFIERS MAXIMUM MARGIN CLASSIFIERS Matthieu R Bloch Tuesday, February 11, 2020 1 LOGISTICS LOGISTICS TAs and Office hours Tuesday: TJ (VL C449 Cubicle D) - 1:30pm - 2:45pm Wednesday: Matthieu (TSRB 423) - 12:00:pm-1:15pm

Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Maximum Margin Criterion Math

About this class Maximizing the Margin Maximum margin classifiers Picture of large and small

Nonlinear Classifiers II 2 Nonlinear Classifiers: Introduction Classifiers Supervised

Support Vector Machines Greg Mori - CMPT 419/726 Bishop PRML Ch. 7 Maximum Margin Criterion

Cognitive Modeling Unseen Examples 2 Bayes Classifiers Lecture 14: Naive Bayes Classifiers

MAXIMUM CARDS MAXIMUM CARDS What is a Maximum Card ? The Maximum Card is the one which contains a

Fusion of Continuous Output Classifiers Classifiers Jacob Hays Amit Pillay James DeFelice

Machine Learning Nave Bayes classifiers Types of classifiers We can divide the large

Occasion-level Classifiers or Event-level Classifiers? -Evidence from Child Language Acquisition

CS440/ECE448 Lecture 22: Including Slides by Svetlana Lazebnik, 10/2016 Linear Classifiers

Linear, Binary SVM Classifiers COMPSCI 371D Machine Learning COMPSCI 371D Machine

What is the maximum efficiency that What is the maximum efficiency that What is the maximum

Topic #28 Nyquist plots: Gain and phase margin Reference textbook : Control Systems, Dhanesh N.

Data Dependence in Data Dependence in Combining Classifiers Combining Classifiers Mohamed

Automatically Evading Classifiers A Case Study on PDF Malware Classifiers Weilin Xu

Evaluation of Classifiers Evaluation of Classifiers ROC Curves ROC Curves Reject Curves Reject

Introduction to network metrics Ramon Ferrer-i-Cancho & Argimiro Arratia Universitat Polit`

Interactions of light mesons with photons Stefan Leupold Uppsala University Meson 2014, Cracow,

Functional Parallel Don Syme Principal Researcher Microsoft Research, Cambridge Disclaimer

The Irresponsible Commercial Landscape: A Primer of What Journal Editors Need to Know! April 19,

Getting Lost or Getting Trapped: On the Effect of Moves to Incomparable Points in Multiobjective

Parents Briefing Secondary 2 Belief: Learn. Improve. Excel. GREENRIDGE SECONDARY SCHOOL

000000 000001 000010 000011 000100 000101 000110 000111 001000 000000 Reserve 2

Planning R&N Chap. 11 (and a tiny snippet of Chap. 8-9) Limitations of Prop. Logic Not

MAXIMUM MARGIN CLASSIFIERS MAXIMUM MARGIN CLASSIFIERS Matthieu R - PowerPoint PPT Presentation

MAXIMUM MARGIN CLASSIFIERS MAXIMUM MARGIN CLASSIFIERS Matthieu R Bloch Tuesday, February 11, 2020 1 LOGISTICS LOGISTICS TAs and Office hours Tuesday: TJ (VL C449 Cubicle D) - 1:30pm - 2:45pm Wednesday: Matthieu (TSRB 423) - 12:00:pm-1:15pm

Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Maximum Margin Criterion Math

About this class Maximizing the Margin Maximum margin classifiers Picture of large and small

Nonlinear Classifiers II 2 Nonlinear Classifiers: Introduction Classifiers Supervised

Support Vector Machines Greg Mori - CMPT 419/726 Bishop PRML Ch. 7 Maximum Margin Criterion

Cognitive Modeling Unseen Examples 2 Bayes Classifiers Lecture 14: Naive Bayes Classifiers

MAXIMUM CARDS MAXIMUM CARDS What is a Maximum Card ? The Maximum Card is the one which contains a

Fusion of Continuous Output Classifiers Classifiers Jacob Hays Amit Pillay James DeFelice

Machine Learning Nave Bayes classifiers Types of classifiers We can divide the large

Occasion-level Classifiers or Event-level Classifiers? -Evidence from Child Language Acquisition

CS440/ECE448 Lecture 22: Including Slides by Svetlana Lazebnik, 10/2016 Linear Classifiers

Linear, Binary SVM Classifiers COMPSCI 371D Machine Learning COMPSCI 371D Machine

What is the maximum efficiency that What is the maximum efficiency that What is the maximum

Topic #28 Nyquist plots: Gain and phase margin Reference textbook : Control Systems, Dhanesh N.

Data Dependence in Data Dependence in Combining Classifiers Combining Classifiers Mohamed

Automatically Evading Classifiers A Case Study on PDF Malware Classifiers Weilin Xu

Evaluation of Classifiers Evaluation of Classifiers ROC Curves ROC Curves Reject Curves Reject

Introduction to network metrics Ramon Ferrer-i-Cancho &amp; Argimiro Arratia Universitat Polit`

Interactions of light mesons with photons Stefan Leupold Uppsala University Meson 2014, Cracow,

Functional Parallel Don Syme Principal Researcher Microsoft Research, Cambridge Disclaimer

The Irresponsible Commercial Landscape: A Primer of What Journal Editors Need to Know! April 19,

Getting Lost or Getting Trapped: On the Effect of Moves to Incomparable Points in Multiobjective

Parents Briefing Secondary 2 Belief: Learn. Improve. Excel. GREENRIDGE SECONDARY SCHOOL

000000 000001 000010 000011 000100 000101 000110 000111 001000 000000 Reserve 2

Planning R&amp;N Chap. 11 (and a tiny snippet of Chap. 8-9) Limitations of Prop. Logic Not

Introduction to network metrics Ramon Ferrer-i-Cancho & Argimiro Arratia Universitat Polit`

Planning R&N Chap. 11 (and a tiny snippet of Chap. 8-9) Limitations of Prop. Logic Not