ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING Non-linear regression techniques (SVR and extensions, GPR, Gradient Boosting) 1 1
ADVANCED MACHINE LEARNING Regression: Principle ∈ ∈ N Map N-dim. input to a continuous output . x y Learn a function of the type: ( ) → = N : and . f y f x y 3 y True function 2 y Estimate 4 y 1 y x 1 2 3 4 x x x x { } i i Estimate that best predict set of training points , ? f x y = 1,... i M 2 2
ADVANCED MACHINE LEARNING Regression: Issues ∈ ∈ N Map N-dim. input to a continuous output . x y Learn a function of the type: ( ) → = Fit strongly influenced by choice of: N : and . f y f x - datapoints for training - complexity of the model (interpolation) y 3 y True function 2 y Estimate 4 y 1 y x 1 2 3 4 x x x x { } i i Estimate that best predict set of training points , ? f x y = 1,... i M 3 3
ADVANCED MACHINE LEARNING Regression Algorithms in this Course Support Vector Machine Relevance Vector Machine Relevance vector regression Support vector regression Boosting – random projections Boosting – random gaussians Random forest Gradient boosting Gaussian process regression Gaussian Process Not covered in class!! Locally weighted projected regression 4 4
ADVANCED MACHINE LEARNING Regression Algorithms in this Course Support Vector Machine Relevance Vector Machine Relevance Vector Machine Relevance vector regression Support vector regression Boosting – random projections 5 5
ADVANCED MACHINE LEARNING Support Vector Regression 6 6 6
ADVANCED MACHINE LEARNING Support Vector Regression ( ) = Assume a nonlinear mapping , s.t. . f y f x { } i i How to estimate to best predict the pair of training points , ? f x y = 1,... i M How to generalize the support vector machine framework for classification to estimate continuous functions? 1. Assume a non-linear mapping through feature space and then perform linear regression in feature space 2. Supervised learning – minimizes an error function. First determine a way to measure error on testing set in the linear case! 7 7 7
ADVANCED MACHINE LEARNING Support Vector Regression b is estimated as in SVR ( ) through least-square = = + T Assume a linear mapping , s.t. . f y f x w x b regression on support vectors; hence we omit it from the rest of the developments . { } 1,... i i How to estimate and to best predict the pair of training points , ? w b x y = i M y ( ) = = + T y f x w x b Measure the error on prediction x 8 8 8
ADVANCED MACHINE LEARNING Support Vector Regression ε Set an upper bound on the error and consider as correctly classified all points − ≤ ε such that ( ) , f x y y ( ) = = + T y f x w x b Penalize only datapoints that are ε not contained in the -tube. x 9 9 9
ADVANCED MACHINE LEARNING Support Vector Regression The ε -margin is a measure of the width of the ε - insensitive tube. It is a measure of the precision of the regression. A small || w || corresponds to a small slope for f . In the linear case, f is more horizontal. y x ε -margin 10 10 10
ADVANCED MACHINE LEARNING Support Vector Regression A large || w || corresponds to a large slope for f . In the linear case, f is more vertical. The flatter the slope of the function f, the larger the ε− margin. y To maximize the margin, we must minimize the norm of w. x ε -margin 11 11 11
ADVANCED MACHINE LEARNING Support Vector Regression This can be rephrased as a constraint-based optimization problem of the form: Need to penalize points outside 1 the ε -insensitive 2 minimize w tube. 2 + − ≤ ε i , w x b y i subject to − − ≤ ε i , y w x b i ∀ = 1,... i M 12 12 12
ADVANCED MACHINE LEARNING Support Vector Regression ξ ξ ≥ * Introduce slack variables , , 0 : C i i Need to penalize points outside the ε -insensitive ( ) 1 C M ∑ 2 ξ + ξ * minimize + w tube. i i 2 M = i 1 + − ≤ ε + ξ i , w x b y i i ξ i − − ≤ ε + ξ i * subject to , ξ y w x b i * i i ξ ≥ ξ ≥ * 0, 0 i i 13 13 13
ADVANCED MACHINE LEARNING Support Vector Regression ξ ξ ≥ * Introduce slack variables , , 0 : C i i All points outside the ε -tube become ( ) 1 C M ∑ 2 ξ + ξ Support Vectors * minimize + w i i 2 M = i 1 + − ≤ ε + ξ i , w x b y i i ξ i − − ≤ ε + ξ i * subject to , ξ y w x b i * i i ξ ≥ ξ ≥ * 0, 0 i i We now have the solution to the linear regression problem. How to generalize this to the nonlinear case? 14 14 14
ADVANCED MACHINE LEARNING Support Vector Regression Lift x into feature space and then perform linear regression in feature space. Linear Case: ( ) = = + , y f x w x b ( ) → φ Non-Linear Case: x x ( ) → φ x x ( ) ( ) ( ) = φ = φ + , y f x w x b w lives in feature space! 15 15 15
ADVANCED MACHINE LEARNING Support Vector Regression In feature space, we obtain the same constrained optimization problem: ( ) 1 C M ∑ 2 ξ + ξ * minimize + w i i 2 M = 1 i ( ) φ + − ≤ ε + ξ i , w x b y i i ( ) − φ − ≤ ε + ξ i * subject to , y w x b i i ξ ≥ ξ ≥ * 0, 0 i i 16 16 16
ADVANCED MACHINE LEARNING Support Vector Regression Again, we can solve this quadratic problem by introducing sets of Lagrange multipliers and writing the Lagrangian : Lagrangian = Objective function + λ * constraints ( ) ( ) M M 1 C C ∑ ∑ ( ) ξ ξ 2 ξ + ξ − η ξ + η ξ * * * L , , *, = + w b w i i i i i i 2 M M = = i 1 i 1 ( ) ( ) M ∑ − α ε + ξ + − φ − i , y w x b i i i = 1 i ( ) ( ) M ∑ − α ε + ξ − + φ + * * i , y w x b i i i = 1 i 17 17 17
ADVANCED MACHINE LEARNING Support Vector Regression α = α = * 0 for all points that do not satisfy the constraints i i → ε points outside the -tube α > 0 i ξ i ξ * i α > * 0 i Constraints on points lying on either side of the ε -tube ( ) ( ) M M 1 C C ∑ ∑ ( ) ξ ξ 2 ξ + ξ − η ξ + η ξ * * * L , , *, = + w b w i i i i i i 2 M M = = i 1 i 1 ( ) ( ) M ∑ − α ε + ξ + − φ − i , y w x b i i i = 1 i ( ) ( ) M ∑ − α ε + ξ − + φ + * * i , y w x b i i i = 1 i 18 18 18
ADVANCED MACHINE LEARNING Support Vector Regression α = α = * 0 for all points that do not satisfy the constraints i i → ε points outside the -tube α > 0 i α > * 0 i Requiring that the partial derivatives are all zero: ∂ ( ) M M M L ∑ ∑ Rebalancing the effect of the support vectors on ∑ → α = α = α − α = * * 0. both sides of the ε -tube ∂ i i i i b = = = 1 1 1 i i i ∂ ( ) ( ) L M ∑ ( ) ( ) Linear combination of support M = − α − α φ = ∑ * i 0. w x ⇒ = α − α φ * i . vectors w x ∂ i i w i i = 1 i = 1 i 19 19 19
ADVANCED MACHINE LEARNING Support Vector Regression And replacing in the primal Lagrangian, we get the Dual optimization problem: − ( ) ( ) ( ) Kernel Trick 1 M ∑ α − α α − α ⋅ * * i j , k x x ( ) ( ) ( ) i i j j 2 = φ φ i j i j = , , k x x x x , 1 i j max ( ) ( ) M M − ∑ ∑ α α ε α + α + α + α * , * i * y i i i i = = i 1 i 1 ( ) M C ∑ α − α = α α ∈ * * i subject to 0 and , 0, i i i i M = 1 i 20 20 20
ADVANCED MACHINE LEARNING Support Vector Regression The solution is given by : ( ) ( ) M ( ) ∑ = = α − α + * i , y f x k x x b i i = 1 i Linear Coefficients If one uses RBF Kernel, (Lagrange multipliers M un-normalized isotropic for each constraint). Gaussians centered on each training datapoint. 21 21 21
ADVANCED MACHINE LEARNING Support Vector Regression The solution is given by : ( ) ( ) M ( ) ∑ = = α − α + * i , y f x k x x b i i = 1 i Kernel places a Gauss function on each SV y x 22 22 22
ADVANCED MACHINE LEARNING Support Vector Regression Converges to b when The solution is given by : SV effect vanishes. ( ) ( ) M ∑ ( ) = = α − α + * i , y f x k x x b i i = 1 i The Lagrange multipliers y define the importance of each Gaussian function. Y=f(x) b x x x x x x x 1 2 3 4 5 6 α = α = α = α = α = α = * * * 1.5 2 2.5 1.5 3 1 3 1 2 4 6 5 23 23 23
Recommend
More recommend