kernel methods for regression support vector regression
play

Kernel Methods for Regression Support Vector Regression Gaussian - PowerPoint PPT Presentation

MACHINE LEARNING MACHINE LEARNING MACHINE LEARNING Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian Process Regression 1 MACHINE LEARNING 2012 MACHINE LEARNING MACHINE LEARNING Problem


  1. MACHINE LEARNING MACHINE LEARNING MACHINE LEARNING Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian Process Regression 1

  2. MACHINE LEARNING – 2012 MACHINE LEARNING MACHINE LEARNING Problem Statement Predict given input through a non-linear function : y x f    y f x   i i Estimate that best predict set of training points , ? f x y  1,... i M y 3 y 2 y 4 y 1 y 1 2 3 4 x x x x x 2

  3. MACHINE LEARNING – 2012 MACHINE LEARNING MACHINE LEARNING Non-linear regression and the Kernel Trick Non-Linear regression: Fit data with a function that is not linear in the parameters    x   ; ; : parameters of the function y f Non-parametric regression: use the data to determine the parameters of the function so that the problem can be again phrased as a linear regression problem. Kernel Trick: Send data in feature space with non-linear function and perform linear regression in feature space      i , y k x x i i i x : datapoints, k: kernel fct. 3

  4. MACHINE LEARNING – 2012 MACHINE LEARNING MACHINE LEARNING Data-driven Regression Good prediction depends on the choice of datapoints. Blue: true function y Red: estimated function x 5

  5. MACHINE LEARNING – 2012 MACHINE LEARNING MACHINE LEARNING Data-driven Regression Good prediction depends on the choice of datapoints. The more datapoints, the better the fit. Computational costs increase dramatically with number of datapoints Blue: true function y Red: estimated function x 6

  6. MACHINE LEARNING – 2012 MACHINE LEARNING MACHINE LEARNING Kernel methods for regression Several methods in ML for performing non-linear regression. Differ in the objective function, in the amount of parameters. Gaussian Process Regression (GPR) uses all datapoints Blue: true function y Red: estimated function x 7

  7. MACHINE LEARNING – 2012 MACHINE LEARNING MACHINE LEARNING Kernel methods for regression Several methods in ML for performing non-linear regression. Differ in the objective function, in the amount of parameters. Gaussian Process Regression (GPR) uses all datapoints Support Vector Regression (SVR) picks a subset of datapoints (support vectors) Blue: true function y Red: estimated function x 8

  8. MACHINE LEARNING – 2012 MACHINE LEARNING MACHINE LEARNING Kernel methods for regression Several methods in ML for performing non-linear regression. Differ in the objective function, in the amount of parameters. Gaussian Process Regression (GPR) uses all datapoints Support Vector Regression (SVR) picks a subset of datapoints (support vectors) Gaussian Mixture Regression (GMR) generates a new set of datapoints (centers of Gaussian functions) Blue: true function y Red: estimated function x 9

  9. MACHINE LEARNING – 2012 MACHINE LEARNING MACHINE LEARNING Kernel methods for regression   ,    N , y f x x y D eterministic regressive model           2 , with 0, Probabilistic regressive model y f x N Build an estimate of the noise model and then compute f directly (Support Vector Regression) y x 10

  10. MACHINE LEARNING – 2012 MACHINE LEARNING MACHINE LEARNING Support Vector Regression 11

  11. MACHINE LEARNING – 2012 MACHINE LEARNING MACHINE LEARNING Support Vector Regression (SVR)    Assume a nonlinear mapping , s.t. . f y f x   i i How to estimate to best predict the pair of training points , ? f x y  1,... i M How to generalize the support vector machine framework for classification to estimate continuous functions? 1. Assume a non-linear mapping through feature space and then perform linear regression in feature space Supervised learning – minimizes an error function. 2.  First determine a way to measure error on testing set in the linear case! 12

  12. MACHINE LEARNING – 2012 MACHINE LEARNING MACHINE LEARNING Support Vector Regression b is estimated as in SVR through   least-square regression on    T Assume a linear mapping , s.t. . f y f x w x b support vectors; hence we omit it from the rest of the developments .   1,... i i How to estimate and to best predict the pair of training points , ? w b x y  i M    y f x Measure the error on prediction x 13

  13. MACHINE LEARNING – 2012 MACHINE LEARNING MACHINE LEARNING Support Vector Regression  Set an upper bound on the error and consider as correctly classified all points    such that ( ) , f x y Penalize only datapoints that are +𝜁  not contained in the -tube. −𝜁 x 14

  14. MACHINE LEARNING – 2012 MACHINE LEARNING MACHINE LEARNING Support Vector Regression The  -margin is a measure of the   y wx b width of the  -insensitive tube and hence of the precision of the regression. A small || w || corresponds to a  small slope for f . In the linear case, f is more horizontal.  X  -margin 15

  15. MACHINE LEARNING – 2012 MACHINE LEARNING MACHINE LEARNING Support Vector Regression A large || w || corresponds to a large   y wx b slope for f . In the linear case, f is more vertical. The flatter the slope of the function f, the larger the  margin   To maximize the margin, we must minimize the norm of w.  X  -margin 16

  16. MACHINE LEARNING – 2012 MACHINE LEARNING MACHINE LEARNING Support Vector Regression This can be rephrased as a constraint-based optimization problem of the form: Need to penalize points outside 1 the  -insensitive 2 minimize w tube. 2      i , w x b y  i  subject to      i , y w x b i  𝜁   1,... i M 17

  17. MACHINE LEARNING – 2012 MACHINE LEARNING MACHINE LEARNING Support Vector Regression    * Introduce slack variables , , 0 : C i i Need to penalize points outside the  -insensitive   1 C M  2    * minimize + w tube. i i 2 M  i 1        i , w x b y i  i   i       i *  subject to ,  y w x b i * i 𝜁  i     *  0, 0  i i 18

  18. MACHINE LEARNING – 2012 MACHINE LEARNING MACHINE LEARNING Support Vector Regression    * Introduce slack variables , , 0 : C i i All points outside the  -tube become   1 C M  2    Support Vectors * minimize + w i i 2 M  1 i        i , w x b y i  i   i       *  i subject to ,  y w x b i * 𝜁 i  i     *  0, 0  i i We now have the solution to the linear regression problem. How to generalize this to the nonlinear case? 19

  19. MACHINE LEARNING – 2012 MACHINE LEARNING MACHINE LEARNING Support Vector Regression Lift x into feature space and then perform linear regression in feature space. Linear Case:      , y f x w x b     Non-Linear Case: x x     x x            , y f x w x b w lives in feature space! 20

  20. MACHINE LEARNING – 2012 MACHINE LEARNING MACHINE LEARNING Support Vector Regression In feature space, we obtain the same constrained optimization problem:   1 C M  2    * minimize + w i i 2 M  1 i           i , w x b y i  i           *  i subject to , y w x b i i      *  0, 0  i i 21

  21. MACHINE LEARNING – 2012 MACHINE LEARNING MACHINE LEARNING Support Vector Regression Again, we can solve this quadratic problem by introducing sets of Lagrange multipliers and writing the Lagrangian : Lagrangian = Objective function + l * constraints     M M 1 C C     2            * * * L , , *, = + w b w i i i i i i 2 M M   1 1 i i     M           i , y w x b i i i  i 1     M           * * i , y w x b i i i  1 i 22

  22. MACHINE LEARNING – 2012 MACHINE LEARNING MACHINE LEARNING Support Vector Regression Requiring that the partial derivatives are all zero    L M       * 0;  i i b  i 1      M L         * i 0; w x  i i w  i 1  L C           * * 0     i i * i M And replacing in the primal Lagrangian, we get the Dual optimization problem:        1 M         * * i j ,  k x x i i j j  2  i j , 1  max       M M     *         , * * i y  i i i i    1 1 i i     M  C         * * i subject to 0 and , 0,  i i i i   M  i 1 23

Recommend


More recommend