Kernel Methods for Regression Support Vector Regression Gaussian - PowerPoint PPT Presentation

MACHINE LEARNING MACHINE LEARNING MACHINE LEARNING Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian Process Regression 1

MACHINE LEARNING – 2012 MACHINE LEARNING MACHINE LEARNING Problem Statement Predict given input through a non-linear function : y x f    y f x   i i Estimate that best predict set of training points , ? f x y  1,... i M y 3 y 2 y 4 y 1 y 1 2 3 4 x x x x x 2

MACHINE LEARNING – 2012 MACHINE LEARNING MACHINE LEARNING Non-linear regression and the Kernel Trick Non-Linear regression: Fit data with a function that is not linear in the parameters    x   ; ; : parameters of the function y f Non-parametric regression: use the data to determine the parameters of the function so that the problem can be again phrased as a linear regression problem. Kernel Trick: Send data in feature space with non-linear function and perform linear regression in feature space      i , y k x x i i i x : datapoints, k: kernel fct. 3

MACHINE LEARNING – 2012 MACHINE LEARNING MACHINE LEARNING Data-driven Regression Good prediction depends on the choice of datapoints. Blue: true function y Red: estimated function x 5

MACHINE LEARNING – 2012 MACHINE LEARNING MACHINE LEARNING Data-driven Regression Good prediction depends on the choice of datapoints. The more datapoints, the better the fit. Computational costs increase dramatically with number of datapoints Blue: true function y Red: estimated function x 6

MACHINE LEARNING – 2012 MACHINE LEARNING MACHINE LEARNING Kernel methods for regression Several methods in ML for performing non-linear regression. Differ in the objective function, in the amount of parameters. Gaussian Process Regression (GPR) uses all datapoints Blue: true function y Red: estimated function x 7

MACHINE LEARNING – 2012 MACHINE LEARNING MACHINE LEARNING Kernel methods for regression Several methods in ML for performing non-linear regression. Differ in the objective function, in the amount of parameters. Gaussian Process Regression (GPR) uses all datapoints Support Vector Regression (SVR) picks a subset of datapoints (support vectors) Blue: true function y Red: estimated function x 8

MACHINE LEARNING – 2012 MACHINE LEARNING MACHINE LEARNING Kernel methods for regression Several methods in ML for performing non-linear regression. Differ in the objective function, in the amount of parameters. Gaussian Process Regression (GPR) uses all datapoints Support Vector Regression (SVR) picks a subset of datapoints (support vectors) Gaussian Mixture Regression (GMR) generates a new set of datapoints (centers of Gaussian functions) Blue: true function y Red: estimated function x 9

MACHINE LEARNING – 2012 MACHINE LEARNING MACHINE LEARNING Kernel methods for regression   ,    N , y f x x y D eterministic regressive model           2 , with 0, Probabilistic regressive model y f x N Build an estimate of the noise model and then compute f directly (Support Vector Regression) y x 10

MACHINE LEARNING – 2012 MACHINE LEARNING MACHINE LEARNING Support Vector Regression 11

MACHINE LEARNING – 2012 MACHINE LEARNING MACHINE LEARNING Support Vector Regression (SVR)    Assume a nonlinear mapping , s.t. . f y f x   i i How to estimate to best predict the pair of training points , ? f x y  1,... i M How to generalize the support vector machine framework for classification to estimate continuous functions? 1. Assume a non-linear mapping through feature space and then perform linear regression in feature space Supervised learning – minimizes an error function. 2.  First determine a way to measure error on testing set in the linear case! 12

MACHINE LEARNING – 2012 MACHINE LEARNING MACHINE LEARNING Support Vector Regression b is estimated as in SVR through   least-square regression on    T Assume a linear mapping , s.t. . f y f x w x b support vectors; hence we omit it from the rest of the developments .   1,... i i How to estimate and to best predict the pair of training points , ? w b x y  i M    y f x Measure the error on prediction x 13

MACHINE LEARNING – 2012 MACHINE LEARNING MACHINE LEARNING Support Vector Regression  Set an upper bound on the error and consider as correctly classified all points    such that ( ) , f x y Penalize only datapoints that are +𝜁  not contained in the -tube. −𝜁 x 14

MACHINE LEARNING – 2012 MACHINE LEARNING MACHINE LEARNING Support Vector Regression The  -margin is a measure of the   y wx b width of the  -insensitive tube and hence of the precision of the regression. A small || w || corresponds to a  small slope for f . In the linear case, f is more horizontal.  X  -margin 15

MACHINE LEARNING – 2012 MACHINE LEARNING MACHINE LEARNING Support Vector Regression A large || w || corresponds to a large   y wx b slope for f . In the linear case, f is more vertical. The flatter the slope of the function f, the larger the  margin   To maximize the margin, we must minimize the norm of w.  X  -margin 16

MACHINE LEARNING – 2012 MACHINE LEARNING MACHINE LEARNING Support Vector Regression This can be rephrased as a constraint-based optimization problem of the form: Need to penalize points outside 1 the  -insensitive 2 minimize w tube. 2      i , w x b y  i  subject to      i , y w x b i  𝜁   1,... i M 17

MACHINE LEARNING – 2012 MACHINE LEARNING MACHINE LEARNING Support Vector Regression    * Introduce slack variables , , 0 : C i i Need to penalize points outside the  -insensitive   1 C M  2    * minimize + w tube. i i 2 M  i 1        i , w x b y i  i   i       i *  subject to ,  y w x b i * i 𝜁  i     *  0, 0  i i 18

MACHINE LEARNING – 2012 MACHINE LEARNING MACHINE LEARNING Support Vector Regression    * Introduce slack variables , , 0 : C i i All points outside the  -tube become   1 C M  2    Support Vectors * minimize + w i i 2 M  1 i        i , w x b y i  i   i       *  i subject to ,  y w x b i * 𝜁 i  i     *  0, 0  i i We now have the solution to the linear regression problem. How to generalize this to the nonlinear case? 19

MACHINE LEARNING – 2012 MACHINE LEARNING MACHINE LEARNING Support Vector Regression Lift x into feature space and then perform linear regression in feature space. Linear Case:      , y f x w x b     Non-Linear Case: x x     x x            , y f x w x b w lives in feature space! 20

MACHINE LEARNING – 2012 MACHINE LEARNING MACHINE LEARNING Support Vector Regression In feature space, we obtain the same constrained optimization problem:   1 C M  2    * minimize + w i i 2 M  1 i           i , w x b y i  i           *  i subject to , y w x b i i      *  0, 0  i i 21

MACHINE LEARNING – 2012 MACHINE LEARNING MACHINE LEARNING Support Vector Regression Again, we can solve this quadratic problem by introducing sets of Lagrange multipliers and writing the Lagrangian : Lagrangian = Objective function + l * constraints     M M 1 C C     2            * * * L , , *, = + w b w i i i i i i 2 M M   1 1 i i     M           i , y w x b i i i  i 1     M           * * i , y w x b i i i  1 i 22

MACHINE LEARNING – 2012 MACHINE LEARNING MACHINE LEARNING Support Vector Regression Requiring that the partial derivatives are all zero    L M       * 0;  i i b  i 1      M L         * i 0; w x  i i w  i 1  L C           * * 0     i i * i M And replacing in the primal Lagrangian, we get the Dual optimization problem:        1 M         * * i j ,  k x x i i j j  2  i j , 1  max       M M     *         , * * i y  i i i i    1 1 i i     M  C         * * i subject to 0 and , 0,  i i i i   M  i 1 23

Kernel Methods for Regression Support Vector Regression Gaussian - PowerPoint PPT Presentation

MACHINE LEARNING MACHINE LEARNING MACHINE LEARNING Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian Process Regression 1 MACHINE LEARNING 2012 MACHINE LEARNING MACHINE LEARNING Problem

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Vector addition: The zero vector The D -vector whose entries are all zero is the zero vector ,

Regression Based on Support Vector Classification Marcin Orchel AGH University of Science and

Optimization for Kernel Methods S. Sathiya Keerthi Yahoo! Research, Burbank, CA, USA Kernel

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support

Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila Wehbe Kernel Properties -

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Why Deep Learning Is More Natural Questions Efficient than Support Support Vector . . . Support

On-line Support Vector Motivation and antecedents Formulation of SVM regression Machine

Matrix and Vector Operations Matrix and Vector Operations 1 / 21 Matrix and Vector Operations

Day 3 Advanced Vector Architectures Session A: Vector Instruction Execution Pipelines Break

Kernel Methods Lei Tang Arizona State University Jul. 26th, 2007 Lei Tang Kernel Methods

Support Vector Machines Preview What is a support vector machine? The perceptron revisited

State Space Gaussian Processes with Non-Gaussian Likelihoods Hannes Nickisch 1 Arno Solin 2

Understanding Wide Neural Networks Jaehoon Lee Google Brain HEP-AI Journal Club Feb 5, 2019

Gaussian Processes for Robotics McGill COMP 765 Oct 24 th , 2017 A robot must learn Modeling

Lecture 13 Gaussian Process Models - Part 2 Colin Rundel 03/01/2017 1 EDA and GPs 2 t i t j t

I ntroduction to Mobile Robotics Gaussian Processes Wolfram Burgard Cyrill Stachniss Giorgio

Reconst nstruct ruct Radio o Map with Automatic atically ally Constru tructed cted Gaussia

15.1 Last Lecture Want to solve a regression problem. confidence band f = argmin f 2

Kriging a.k.a. Gaussian Process Regression(GPR) Yubo Paul Yang, Algorithm Interest Group,