On-line Support Vector Motivation and antecedents Formulation of - PowerPoint PPT Presentation

Index On-line Support Vector • Motivation and antecedents • Formulation of SVM regression Machine Regression • Characterization of vectors in SVM regression • Procedure for Adding one vector • Procedure for Removing one vector • Procedure for Updating one vector • Demo Mario Martín • Discussion and Conclusions Software Department – KEML Group Universitat Politècnica de Catalunya Motivation On-line applications • SVM has nice (theoretical and practical) • What happens when: properties: – You have trained your SVM but new data is available? – Some of your data must be updated? – Generalization – Some data must be removed? – Convergence to optimum solution • In some applications we need actions to efficiently • This extends to SVM for regression (function approximation) – Add new data – Remove old data • But they present some practical problems in – Update old data the application to interesting problems

On-line applications Antecedents • Some examples in regression: • (Cawenbergs, Poggio 2000) presents a method for incrementally build exact SVMs for classification – Temporal series prediction: New data for learning but system must predict from the first data (for instance • Allow us to incrementally add and remove vectors prediction of share values for companies in the market). to/from the SVM – Active Learning: Learning agent sequentially chooses from a set of examples the next data from which to • Goals: learn. – Efficient procedure in memory and time for solving – Reinforcement Learning: Estimated Q target values for SVMs existing data change as learning goes on. – Efficient computation of Leave-One-Out Error Incremental approaches On-line SVM regression • (Nando de Freitas, et alt 2000): • Based on C&P method but applied to regression. – Regression based on the Kalman Filter and windowing. • Goal: allow the application of SVM regression to on-line problems. – Bayesian framework. – Not an exact method (only inside the window or with RBFs). • Essence of the method: – Not able to update or remove data. “Add/remove/update one vector by varying in the • (Domeniconi, Gunopulus 2001): right direction the influence on the regression tube of the vector until it reaches a consistent KKT – Train with n vectors. Keep support vectors. Select heuristically the following k vectors from a set of m condition while maintaining KKT conditions of the vectors. Then learn from scratch with the k vectors and remaining vectors.” the support vectors.

SVM regression • See the excellent slides of Belanche’s talk. Formulation of SVM • In particular, we are interested in ε -insensitive support vector machine regression: regression Goal: find a function that presents at most ε deviation from the target values while being as “flat” as possible. Graphical example Formulation of SVM regression ε -tube • The dual formulation for ε -insensitive support vector regression consists in finding the values for α , α * that minimize the following quadratic objective function:

Computing b • Adding b Lagrange coefficient for including constraint in the formulation, we get: subject to constraints: with constraint: where Solution to the dual formulation • Regression function: Characterization of vectors in SVM regression • KKT conditions: ¦ α i * = 0 – α i – α i (*) = C only for points outside the ε -tube – α i (*) ∈ (0,C) → i lies in the margin

Obtaining FO conditions • We will characterize vectors by using the KKT conditions and by deriving the dual SVM regression formulation wrt the Lagrange coefficients (FO conditions) Comparing with solution: Renaming:

Reformulation of TO KEEP IN MIND!!!! FO conditions (1) • g allows us to classify vectors depending on its membership to sets R, S, E and E * (1) • Complete characterization of the SVM implies (2) knowing β for vectors in the margin.

Reformulation of Will be used later... FO conditions (2) (3) Procedure • Has the new vector c any influence on the regression tube? Adding one vector – Compute g c and g c* – If both values are positive, the new point lies inside the ε -tube and β c =0 – If g c <0 then β c must be incremented until it achieves a consistent KKT condition – If g c* <0 then β c must be decremented until it achieves a consistent KKT condition

But ... Step by step • Increasing and decreasing β c changes the • First, assume that variation in β c is so small * and β i of vectors ε -tube and thus g i , g i that does not change membership of already in D vectors.... • Even more, increasing and decreasing β c • In this case, how variation in β c change * and β i of the other vectors assuming can change the membership of vectors to g i , g i sets R, S, E and E * that these vectors do not transfer from one set to another? * by modifying β c Changes in g i by modifying β c Changes in g i

Equations valid for all vectors Changes in ∑ β j (while vectors do not migrate) Vectors in the margin • If vectors do not change membership to sets then, for vectors i in the margin, ∆ g i = ∆ g i* = 0

T O K E E P I N M I N D Vectors not in the margin T O K E E P I N M I N D

Procedure Computational resources Computational resources • Time resources: • Memory: – Still not deeply studied, but: – Keep g for vectors not in S – Keep β for vectors in S • Maximum 2 |D| iterations for adding one new vector • Linear costs for computing γ , δ and R – Keep R (dimensions: |S| 2 ) – Empirical comparison with QP shows that this – Keep Q ij for i,j in S (dimensions: |S| 2 ) method is at least one order of magnitude faster for learning the whole training set

Transfer of vectors between sets • Transfers only from [Computational details] neighbor sets: – From E to S – From S to E – From S to R – From R to S – From S to E* – From E* to S Efficient update of R matrix Transfer of vectors • Always from/to S to/from R, E or E* • Naive procedure: maintain and compute the inverse – Update vector membership to sets – Create/remove β entry – Create/remove g entry – Update R matrix ...inefficient. • A better approach: Adapt Poggio & Cawenbergs recursive update to regression.

Recursive update Trivial case • Adding one margin • Adding the first margin support vector support vector c • Removing one margin support vector Removing one vector

Update target value • Obvious way: Updating target value for one vector • More efficient way: – Compute g and g* for new target value. – Determine if the influence of the vector should be increased or decreased (and in which direction). – Update β c “ carefully” until c status becomes consistent with a KKT condition. Matlab Demo Conclusion and Discussion

Conclusions Some possible future applications • On-line learning in classification. • We have seen an on-line learning method for SVMs that: – Incremental learning. – Active Learning. – It is an exact method – Transduction. – It is efficient in memory and time – ... – It allows the application of SVM for • On-line regression. classification and regression to on-line – Prediction in real-time temporal series. applications – Generalization in Reinforcement Learning. – ... Software and future extensions • Matlab code for regression available from http://www.lsi.upc.es/~mmartin/svmr.html • Future extension to ν - SVM and adaptive margin algorithms [It seems extensible to ν -SVM, but not (still) to SVMr with other loss functions like quadratic or Huber loss.]

On-line Support Vector Motivation and antecedents Formulation of - PowerPoint PPT Presentation

Index On-line Support Vector Motivation and antecedents Formulation of SVM regression Machine Regression Characterization of vectors in SVM regression Procedure for Adding one vector Procedure for Removing one vector

The Slope of a Line The Slope of a Line The Slope of a Line The Slope of a Line The Slope of a

Title Slide Math 696 Class July 19, 2002 Line 1 Line 2 Line 3 Line 4 Line 5 Line 6 Line 7

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Lines Consider a line, a vector x 0 going from the origin to a point on the line and a vector v

Why Deep Learning Is More Natural Questions Efficient than Support Support Vector . . . Support

Matrix and Vector Operations Matrix and Vector Operations 1 / 21 Matrix and Vector Operations

Day 3 Advanced Vector Architectures Session A: Vector Instruction Execution Pipelines Break

Support Vector Machines Preview What is a support vector machine? The perceptron revisited

? 17.10.2018 3 17.10.2018 4 Support Vector Machines (SVM): Background Support Vector Machines

Support Vector Machines October 16, 2018 Support Vector Machines October 16, 2018 1 / 31

X-Line 101 June 2019 X-Line 101 X-Line Unit Overview What makes X-Line unique X-Line 101

Support Vector Machines (Ch. 18.9) SVM Basics Support Vector Machines (SVMs) try to do our

Router Architectures CPU CPU Memory Memory packets NFE NFE Processor Processor Line Card

Relevance Vector Machines Jukka Lankinen LUT February 21, 2011 Jukka Lankinen Relevance Vector

Lecture 11 Vector Linear Network Coding Vector Linear Network Coding Outline Fundamentals for

. Vector Graphics Introduction to Web Design Vector graphics contain geometric objects, such as

Functional Analytic Framework Functional Analytic Framework for Model Selection for Model

(Sub)Gradient Descent CMSC 422 M ARINE C ARPUAT marine@cs.umd.edu Figures credit: Piyush Rai

GLoBES Patrick Huber Center for Neutrino Physics, Virginia Tech What is ? Galileo Galilei

GLoBES Patrick Huber IPNAS, Virginia Tech International Neutrino Summer School Fermilab, July

Eisenstein Series for subgroups of SL ( 2 , Z ) Tim Huber Iowa State University June 3, 2009

Linear Models for Multi-Frame Super-Resolution Restoration under Non-Affine Registration and

Optimization for Kernel Methods S. Sathiya Keerthi Yahoo! Research, Burbank, CA, USA Kernel

Robust Feature Matching and Fast GMS Solution Singapore University of Technology and Design

Sambuz

Useful Links

Newsletter

Mail Us