Introduction Maximum Margin Multiple Classes Regression Example Summary Sparse Kernel Machines - SVM Henrik I. Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I. Christensen (RIM@GT) Support Vector Machines 1 / 42
Introduction Maximum Margin Multiple Classes Regression Example Summary Outline Introduction 1 Maximum Margin Classifiers 2 Multi-Class SVM’s 3 The regression case 4 Small Example 5 Summary 6 Henrik I. Christensen (RIM@GT) Support Vector Machines 2 / 42
Introduction Maximum Margin Multiple Classes Regression Example Summary Introduction Last time we talked about Kernels and Memory Based Models Estimate the full GRAM matrix can pose a major challenge Desirable to store only the “relevant” data Two possible solutions discussed Support Vector Machines (Vapnik, et al.) 1 Relevance Vector Machines 2 Main difference in how posterior probabilities are handled Small robotics example at end to show SVM performance Henrik I. Christensen (RIM@GT) Support Vector Machines 3 / 42
Introduction Maximum Margin Multiple Classes Regression Example Summary Outline Introduction 1 Maximum Margin Classifiers 2 Multi-Class SVM’s 3 The regression case 4 Small Example 5 Summary 6 Henrik I. Christensen (RIM@GT) Support Vector Machines 4 / 42
Introduction Maximum Margin Multiple Classes Regression Example Summary Maximum Margin Classifiers - Preliminaries Lets initially consider a linear two-class problems y ( x ) = w T φ ( x ) + b with φ ( . ) being a feature space transformation and b is the bias factor Given a training dataset x i , i ∈ { 1 ... N } Target values t i , i ∈ { 1 ... N } , t i ∈ {− 1 , 1 } Assume for now that there is a linear solution to the problem Henrik I. Christensen (RIM@GT) Support Vector Machines 5 / 42
Introduction Maximum Margin Multiple Classes Regression Example Summary The objective The objective here is to optimize the margin Let’s just keep the points at the margin y = 1 y = − 1 y = 0 y = 0 y = − 1 y = 1 margin Henrik I. Christensen (RIM@GT) Support Vector Machines 6 / 42
Introduction Maximum Margin Multiple Classes Regression Example Summary Recap distances and metrics x 2 y > 0 y = 0 R 1 y < 0 R 2 x w y ( x ) � w � x ⊥ x 1 − w 0 � w � Henrik I. Christensen (RIM@GT) Support Vector Machines 7 / 42
Introduction Maximum Margin Multiple Classes Regression Example Summary The objective function We know that y ( x ) and t are supposed to have the same sign so that y ( x ) t > 0, i.e. = t n ( w T φ ( x n ) + b ) t n y ( x n ) || w || || w || The solution is then � 1 �� � t n ( w T φ ( x n ) + b ) arg max || w || min n w , b We can scale w and b without loss of generality. Scale parameters to make the key vector points � � w T φ ( x n ) + b = 1 t n Then for all data points it is true � � w T φ ( x n ) + b t n ≥ 1 Henrik I. Christensen (RIM@GT) Support Vector Machines 8 / 42
Introduction Maximum Margin Multiple Classes Regression Example Summary Parameter estimation We need to optimize || w || − 1 which can be seen as minimizing || w || 2 subject to the margin requirements In Lagrange terms this is then N L ( w , b , a ) = 1 2 || w || 2 − � � � � � w T φ ( x n ) + b a n t n − 1 n =1 Analyzing partial derivatives gives us N � = a n t n φ ( x n ) w n =1 N � 0 = a n t n n =1 Henrik I. Christensen (RIM@GT) Support Vector Machines 9 / 42
Introduction Maximum Margin Multiple Classes Regression Example Summary Parameter estimation Eliminating w and b from the objective function we have N N N a n − 1 � � � L ( a ) = a n a m t n t m k ( x n , x m ) 2 n =1 n =1 m =1 This is a quadratic optimization problem - see in a minute We can evaluate new points using the form N � y ( x ) = a n t n k ( x , x n ) n =1 Henrik I. Christensen (RIM@GT) Support Vector Machines 10 / 42
Introduction Maximum Margin Multiple Classes Regression Example Summary Estimation of the bias Once w has been estimated we can use that for estimation of the bias � � b = 1 � � t n − a m t m k ( x n , x m ) N S n ∈ S m ∈ S Henrik I. Christensen (RIM@GT) Support Vector Machines 11 / 42
Introduction Maximum Margin Multiple Classes Regression Example Summary Illustrative Synthetic Example Henrik I. Christensen (RIM@GT) Support Vector Machines 12 / 42
Introduction Maximum Margin Multiple Classes Regression Example Summary Status We have formulated the objective function Still not clear how we will solve it! We have assumed the classes are separable How about more messy data? Henrik I. Christensen (RIM@GT) Support Vector Machines 13 / 42
Introduction Maximum Margin Multiple Classes Regression Example Summary Overlapping class distributions Assume some data cannot be correctly classified Lets define a margin distance ξ n = | t n − y ( x n ) | Consider ξ < 0 - correct classification 1 ξ = 0 - at the margin / decision boundary 2 ξ ∈ [0; 1] between decision boundary and margin 3 ξ ∈ [1; 2] between margin and other boundary 4 ξ > 2 - the point is definitely misclassified 5 Henrik I. Christensen (RIM@GT) Support Vector Machines 14 / 42
Introduction Maximum Margin Multiple Classes Regression Example Summary Overlap in margin y = − 1 y = 0 y = 1 ξ > 1 ξ < 1 ξ = 0 ξ = 0 Henrik I. Christensen (RIM@GT) Support Vector Machines 15 / 42
Introduction Maximum Margin Multiple Classes Regression Example Summary Recasting the problem Optimizing not just for w but also for misclassification So we have N ξ n + 1 � C 2 || w || n =1 where C is a regularization coefficient. We have a new objective function N N N L ( w , b , a ) = 1 2 || w || 2 + C � � � ξ n − a n { t n y ( x n ) − 1 + ξ n }− µ n ξ n n +1 n =1 n =1 where a and µ are Lagrange multipliers Henrik I. Christensen (RIM@GT) Support Vector Machines 16 / 42
Introduction Maximum Margin Multiple Classes Regression Example Summary Optimization As before we can derivate partial derivatives and find the extrema. The resulting objective function is then N N N a n − 1 � � � L ( a ) = a n a m t n t m k ( x n , x m ) 2 n =1 n =1 m =1 which is like before bit the constraints are a little different 0 ≤ a n ≤ C and � N n =1 a n t n = 0 which is across all training samples Many training samples will have a n = 0 which is the same as saying they are not at the margin. Henrik I. Christensen (RIM@GT) Support Vector Machines 17 / 42
Introduction Maximum Margin Multiple Classes Regression Example Summary Generating a solution Solutions are generated through analysis of all training date Re-organization enable some optimization (Vapnik, 1982) Sequential minimal optimization is a common approach (Platt, 2000) Considers pairwise interaction between Lagrange multipliers Complexity is somewhere between linear and quadratic Henrik I. Christensen (RIM@GT) Support Vector Machines 18 / 42
Introduction Maximum Margin Multiple Classes Regression Example Summary Mixed example 2 0 −2 −2 0 2 Henrik I. Christensen (RIM@GT) Support Vector Machines 19 / 42
Introduction Maximum Margin Multiple Classes Regression Example Summary Outline Introduction 1 Maximum Margin Classifiers 2 Multi-Class SVM’s 3 The regression case 4 Small Example 5 Summary 6 Henrik I. Christensen (RIM@GT) Support Vector Machines 20 / 42
Introduction Maximum Margin Multiple Classes Regression Example Summary Multi-Class SVMs This far the discussion has been for the two-class problem How to extend to K classes? One versus the rest 1 Hierarchical Trees - One vs One 2 Coding the classes to generate a new problem 3 Henrik I. Christensen (RIM@GT) Support Vector Machines 21 / 42
Introduction Maximum Margin Multiple Classes Regression Example Summary One versus the rest Training for each class with all the others serving as the non-class training samples Typically training is skewed - too few positives compared to negatives Better fit for the negatives The one vs all implies extra complexity in training ≈ K 2 Henrik I. Christensen (RIM@GT) Support Vector Machines 22 / 42
Introduction Maximum Margin Multiple Classes Regression Example Summary Tree classifier Organize the problem as a tree selection Best first elimination - select easy cases first Based on pairwise comparison of classes. Still requires extra comparison of K 2 classes Henrik I. Christensen (RIM@GT) Support Vector Machines 23 / 42
Introduction Maximum Margin Multiple Classes Regression Example Summary Coding new classes Considering optimization of an error coding How to minimize the criteria function to minimize errors Considered a generalization of voting based strategy Poses a larger training challenge Henrik I. Christensen (RIM@GT) Support Vector Machines 24 / 42
Introduction Maximum Margin Multiple Classes Regression Example Summary Outline Introduction 1 Maximum Margin Classifiers 2 Multi-Class SVM’s 3 The regression case 4 Small Example 5 Summary 6 Henrik I. Christensen (RIM@GT) Support Vector Machines 25 / 42
Recommend
More recommend