Classifiers: Support Vector Machine 1 MACHINE LEARNING What is - PowerPoint PPT Presentation

MACHINE LEARNING MACHINE LEARNING Classifiers: Support Vector Machine 1

MACHINE LEARNING What is Classification? Female Adult Children Detecting facial attributes He & Zhang, Pattern Recognition, 2011 Sony (Make Believe) Training set must be as unambiguous as possible Not easy, especially as members of different classes may share similar attributes Learning implies generalization; which of the features of each member of class makes the class most distinguishable from the other classes. 2

MACHINE LEARNING Multi-Class Classification Male Adult Female Adult Children Whenever possible, the classes should be balanced Garbage model : Male adult versus anything that is neither a female adult nor a child Classes can no longer be balanced! 3

MACHINE LEARNING Classifiers There is a plethora of classifiers, e.g: - Neural networks (feed-forward with backpropagation, multi-layer perceptron) - Decision trees (C4.5, random forest) - Kernel methods (support vector machine, gaussian process classifier) - Mixtures of linear classifiers (boosting) In this class, we will see only SVM and Boosting for mixture of classifiers Each classifier type has its pros and cons: - Complex model: embed non-linearity but heavy computation - Simple models: often high number of models, hence high stack memory Number of hyperparameters: high  extensive crossvalidation to determine - the optimal classifier - Some of the classifiers come with guarantees for global optimal solution; other have only local optimality guarantee 4

MACHINE LEARNING Support Vector Machine Brief history: SVM was invented by Vladimir Vapnik Started with the invention of the statistical learning theory (Vapnik1979) The current form of SVM was presented in (Boser, Guyon and Vapnik 1992) and Cortes and Vapnik (1995) Textbooks: A good survey of the theory behind SVM is An easy introduction to SVM is given in Learning with given in Support Vector Machines and other Kernels by Bernhard Scholkopf and Alexander Smola. Kernel Based Learning methods by Nello Cristianini and John-Shawe Taylor. 5

MACHINE LEARNING Support Vector Machine Was applied to numerous classification problems: - Computer vision (face detection, object recognition, feature categorization, etc) - Bioinformatics (categorization of gene expression, of microarray data) - WWW (categorization of websites) - Production (control of quality, detection of defaults) - Robotics (categorization of sensor readings) - Finance (bankruptcy prediction) The success of SVM is mainly due to: - Its ease of use (lots of software available, good documentation) - Excellent performance on variety of datasets - Good solvers making optimization (learning phase) very quick Very fast at retrieval time – does not hinder practical applications - 6

MACHINE LEARNING Optimal Linear Classification ‘good’ ‘OK’ ‘bad’ • Which choice is better? • How could we formulate this problem? 7

MACHINE LEARNING (W, b) Linear Classifiers x f y est       f x w b ; , sgn w x , b denotes -1 denotes +1 How would you classify this data? 8

MACHINE LEARNING (W, b) Linear Classifiers x f y est       f x w b ; , sgn w x , b denotes -1 denotes +1 Any of these would be fine.. ..but which is best? 12

MACHINE LEARNING (W, b) Classifier Margin x f y est       f x w b ; , sgn w x , b denotes -1 Define the margin denotes +1 of a linear classifier as the width that the boundary could be increased by before hitting a datapoint. 13

MACHINE LEARNING (W, b) Classifier Margin x f y est       f x w b ; , sgn w x , b denotes -1 The maximum denotes +1 margin linear classifier is the linear classifier with the maximum margin. This is the simplest kind of SVM (Called an LSVM) Linear SVM 14

MACHINE LEARNING (W, b) Classifier Margin x f y est       f x w b ; , sgn w x , b denotes -1 The maximum denotes +1 margin linear classifier is the linear classifier Support Vectors with the, um, are those maximum margin. datapoints that the margin This is the pushes up simplest kind of against SVM (Called an LSVM) Linear SVM 15

MACHINE LEARNING (W, b) Classifier Margin x f y est       f x w b ; , sgn w x , b denotes -1 denotes +1 Need to determine a measure of the margin 16

MACHINE LEARNING (W, b) Classifier Margin x f y est       f x w b ; , sgn w x , b denotes -1 denotes +1 Need to determine a measure of the margin  To maximize this measure 17

MACHINE LEARNING Determining the Optimal Separating Hyperplane     x : w x , b 0      x : w x , b 1      x : w x , b 1 Definition :    w x , b 1. The margin on either side of the hyperplane satisfy 18

MACHINE LEARNING Determining the Optimal Separating Hyperplane Decision function: Class with label y=-1       f x w b ; , sgn w x , b Class with label y=+1      x : w x , b 2      x : w x , b 2      x : w x , b 3      x : w x , b 3 Points on either side of the separating plane have negative and positive coordinates, respectively . 19

MACHINE LEARNING Determining the Optimal Separating Hyperplane   w x , b 0 ? x What is the distance from a point x to the hyperplane < w , x> +b= 0? 20

MACHINE LEARNING Determining the Optimal Separating Hyperplane   x s t ' . . w x , ' b 0    w x , x ' w x , ' w x , ,     , ' , . w x x b w x x ’ -x x Projection of x-x' onto w:     w w x , b w x , b w  w x ’ 2 w w w unitary vector The margin between two classes is at least 2/||w||.  w x , b  Distance to hyperplane w 21

MACHINE LEARNING Determining the Optimal Separating Hyperplane Class with label y=-1 Two points on either side of Class with label y=+1 the margin:    1 w x , b 1    2 w x , b 1 x1      1 2 x2 w , x x 2 2    1 2 x x w The margin between two classes is at least 2/||w||. 22

MACHINE LEARNING – 2012 Determining the Optimal Separating Hyperplane 2 Separating condition is measured by . w w To maximize this condition is equivalent to minimizing . 2 2 w Better even is to minimize the convex form . 2 23

MACHINE LEARNING – 2012 Determining the Optimal Separating Hyperplane • Finding the Optimal Separating Hyperplane turns out to be an optimization problem of the following form: 1 2 min w 2 w b ,        w x , b 1 when y 1   i i    y w x , b 1, i i i=1,2,....,M.       w x , b 1 when y 1 i i  • N +1 parameters (N: dimension of data) • M constraints (M: nm of datapoints) • It is called the primal problem. 24

MACHINE LEARNING – 2012 Determining the Optimal Separating Hyperplane Rephrase the minimization under constraint problem in terms of the Lagrange Multipliers a i , i = 1, ..., M (M, # of data points), one for each of the inequality constraints and we get the dual problem :     M 1    a   a   2 L w b , , w y w x , b 1 i i i 2  i 1 a  with 0 i (Minimization of convex function under linear constraints through Lagrange gives the optimal solution) 25

MACHINE LEARNING – 2012 Determining the Optimal Separating Hyperplane The solution of this problem is found when maximizing over a and minimizing over w and b :     a max min L w b , , a  w b , 0 where     M 1    a  2  a   L w b , , w y w x , b 1 i i i 2  i 1 26

MACHINE LEARNING – 2012 Determining the Optimal Separating Hyperplane Requesting that the gradient of L vanishes with w.    a M L w b , ,     a 0 w y x i i  i w  i 1 The vector defining the hyperplane is determined by the training points. Note that while w is unique (minimization of convex function), the alpha are not unique. 27

MACHINE LEARNING – 2012 Determining the Optimal Separating Hyperplane Requesting that the gradient of L vanishes with w.    a M L w b , ,    a  0 y 0 i  i b  i 1 Requires minimum one datapoint on each class 28

Classifiers: Support Vector Machine 1 MACHINE LEARNING What is - PowerPoint PPT Presentation

MACHINE LEARNING MACHINE LEARNING Classifiers: Support Vector Machine 1 MACHINE LEARNING What is Classification? Female Adult Children Detecting facial attributes He & Zhang, Pattern Recognition, 2011 Sony (Make Believe) Training set

Nonlinear Classifiers II 2 Nonlinear Classifiers: Introduction Classifiers Supervised

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Cognitive Modeling Unseen Examples 2 Bayes Classifiers Lecture 14: Naive Bayes Classifiers

Machine Learning Nave Bayes classifiers Types of classifiers We can divide the large

Why Deep Learning Is More Natural Questions Efficient than Support Support Vector . . . Support

Linear, Binary SVM Classifiers COMPSCI 371D Machine Learning COMPSCI 371D Machine

Fusion of Continuous Output Classifiers Classifiers Jacob Hays Amit Pillay James DeFelice

Support Vector Machine w T x + b = 0 b || w || Support Vector Support Vector w X i y i ( x

Support Vector Machines Preview What is a support vector machine? The perceptron revisited

Occasion-level Classifiers or Event-level Classifiers? -Evidence from Child Language Acquisition

CS440/ECE448 Lecture 22: Including Slides by Svetlana Lazebnik, 10/2016 Linear Classifiers

Linear Classifiers: Expressiveness Machine Learning 1 Lecture outline Linear models:

? 17.10.2018 3 17.10.2018 4 Support Vector Machines (SVM): Background Support Vector Machines

Matrix and Vector Operations Matrix and Vector Operations 1 / 21 Matrix and Vector Operations

Day 3 Advanced Vector Architectures Session A: Vector Instruction Execution Pipelines Break

Support Vector Machines October 16, 2018 Support Vector Machines October 16, 2018 1 / 31

Neural Net Backpropagation 3/20/17 Recall: Limitations of Perceptrons vs. AND and OR are

CS6220: DATA MINING TECHNIQUES Image Data: Classification via Neural Networks Instructor: Yizhou

Gradients of Deep Networks Chris Cremer March 29 2017 Neural Net $ %

Deep Networks Andrea Passerini passerini@disi.unitn.it Machine Learning Deep Networks Need for

AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009 SUPERVISED LEARNING

Data Mining Lecture Notes for Chapter 4 Artificial Neural Networks Introduction to Data Mining ,

libSVM LING572 Advanced Statistical Methods for NLP February 18, 2020 1 Documentation

Why Squashing Functions in Shall We Go Beyond . . . Which . . . Multi-Layer Neural Invariance

Sambuz

Useful Links

Newsletter

Mail Us