ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING Non-linear regression techniques 1 1
ADVANCED MACHINE LEARNING Regression: Principle N Map N-dim. input x to a continuous output y . Learn a function of the type: N : and . f y f x y 3 y True function 2 Estimate y 4 y 1 y x 1 2 3 4 x x x x i i Estimate that best predict set of training points f x , y ? i 1,... M 2 2
ADVANCED MACHINE LEARNING Regression: Issues N Map N-dim. input x to a continuous output y . Learn a function of the type: N Fit strongly influenced by choice of: : and . f y f x - datapoints for training - complexity of the model (interpolation) y 3 y True function 2 y Estimate 4 y 1 y x 1 2 3 4 x x x x i i Estimate that best predict set of training points f x , y ? i 1,... M 3 3
ADVANCED MACHINE LEARNING Regression Algorithms in this Course Support Vector Machine Relevance Vector Machine Relevance vector regression Support vector regression Boosting – random projections Boosting – random gaussians Random forest Gradient boosting Gaussian process regression Gaussian Process Locally weighted projected regression 4 4
ADVANCED MACHINE LEARNING Today, we will see: Support Vector Machine Relevance Vector Machine Relevance Vector Machine Relevance vector regression Support vector regression Boosting – random projections 5 5
ADVANCED MACHINE LEARNING Support Vector Regression 66
ADVANCED MACHINE LEARNING Support Vector Regression Assume a nonlinear mapping , s.t. f y f x . i i How to estimate to best predict the pair of training points f x , y ? i 1,... M How to generalize the support vector machine framework for classification to estimate continuous functions? 1. Assume a non-linear mapping through feature space and then perform linear regression in feature space Supervised learning – minimizes an error function. 2. First determine a way to measure error on testing set in the linear case! 77
ADVANCED MACHINE LEARNING Support Vector Regression T Assume a linear mapping , s.t. f y f x w x b . 1,... i i How to estimate and to best predict the pair of training points w b x y , ? i M y T y f x w x b Measure the error on prediction x 88
ADVANCED MACHINE LEARNING Support Vector Regression Set an upper bound on the error and consider as correctly classified all points such that f x ( ) y . y T y f x w x b Penalize only datapoints that are not contained in the -tube. x 99
ADVANCED MACHINE LEARNING Support Vector Regression The -margin is a measure of the width of the -insensitive tube. It is a measure of the precision of the regression. A small || w || corresponds to a small slope for f . In the linear case, f is more horizontal. y x -margin 10 10
ADVANCED MACHINE LEARNING Support Vector Regression A large || w || corresponds to a large slope for f . In the linear case, f is more vertical. The flatter the slope of the function f, the larger the margin. y To maximize the margin, we must minimize the norm of w. x -margin 11 11
ADVANCED MACHINE LEARNING Support Vector Regression This can be rephrased as a constraint-based optimization problem of the form: Need to penalize points outside the -insensitive 1 2 minimize w tube. 2 i w x , b y i subject to i y w x , b i i 1,... M Consider as correctly classified all points such that f x ( ) y . 12 12
ADVANCED MACHINE LEARNING Support Vector Regression * Introduce slack variables , , 0: C i i Need to penalize points outside the -insensitive M 1 C 2 * minimize w + tube. i i 2 M i 1 i * w x , b y i i i i subject to , y w x b i * i i * 0, 0 i i 13 13
ADVANCED MACHINE LEARNING Support Vector Regression * Introduce slack variables , , 0: C i i All points outside the -tube become M 1 C 2 Support Vectors * minimize w + i i 2 M i 1 i * w x , b y i i i i subject to , y w x b i * i i * 0, 0 i i We now have the solution to the linear regression problem. How to generalize this to the nonlinear case? 14 14
ADVANCED MACHINE LEARNING Support Vector Regression We can solve this quadratic problem by introducing sets of , Lagrange multipliers and writing the Lagrangian : Lagrangian = Objective function + multipliers * constraints M M 1 C C 2 * * * L w , , *, b = w + i i i i i i 2 M M i 1 i 1 M i y w x , b i i i i 1 M * * i , y w x b i i i i 1 15 15
ADVANCED MACHINE LEARNING Support Vector Regression * 0 for all points that satisfy the constraints i i points inside the -tube 0 i i * i * 0 i Constraints on points lying on either side of the M M 1 C C -tube. 2 * * * L w , , *, b = w + i i i i i i 2 M M i 1 i 1 M i y w x , b i i i i 1 M * * i , y w x b i i i i 1 16 16
ADVANCED MACHINE LEARNING Support Vector Regression Requiring that the partial derivatives are all zero: M M M L Rebalancing the effect of the support vectors on * * both sides of the -tube 0. i i i i b i 1 i 1 i 1 M M L Linear combination of support * i * i w x . w x 0. vectors i i i i w i 1 i 1 The solution is given by : y f x w x , b M * i x x , b i i i 1 17 17
ADVANCED MACHINE LEARNING Support Vector Regression Lift x into feature space and then perform linear regression in feature space. Linear Case: y f x w x , b x x Non-Linear Case: x x y f x w , x b w lives in feature space! 18 18
ADVANCED MACHINE LEARNING Support Vector Regression In feature space, we obtain the same constrained optimization problem: M 1 C 2 * minimize w + i i 2 M i 1 i * , w x b y i i i subject to y w , x b i i * 0, 0 i i 19 19
ADVANCED MACHINE LEARNING Support Vector Regression We can solve this quadratic problem by introducing sets of , Lagrange multipliers and writing the Lagrangian : Lagrangian = Objective function + multipliers * constraints M M 1 C C 2 * * * L w , , *, b = w + i i i i i i 2 M M i 1 i 1 M * i y w , x b i i i i 1 M * i , y w x b i i i i 1 20 20
ADVANCED MACHINE LEARNING Support Vector Regression And replacing in the primal Lagrangian, we get the Dual optimization problem: Kernel Trick M 1 * * i j k x x , i i j j 2 i j i j k x x , x , x i j , 1 max M M * , * i * y i i i i i 1 i 1 M C * * i subject to 0 and , 0, i i i i M i 1 21 21
ADVANCED MACHINE LEARNING Support Vector Regression The solution is given by : M * i y f x k x x , b i i i 1 Linear Coefficients If one uses RBF Kernel, (Lagrange multipliers M un-normalized isotropic for each constraint). Gaussians centered on each training datapoint. 22 22
Recommend
More recommend