ADVANCED MACHINE LEARNING Non-linear regression techniques (SVR and - PowerPoint PPT Presentation

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING Non-linear regression techniques (SVR and extensions, GPR, Gradient Boosting) 1 1

ADVANCED MACHINE LEARNING Regression: Principle ∈ ∈   N Map N-dim. input to a continuous output . x y Learn a function of the type: ( ) → =   N : and . f y f x y 3 y True function 2 y Estimate 4 y 1 y x 1 2 3 4 x x x x { } i i Estimate that best predict set of training points , ? f x y = 1,... i M 2 2

ADVANCED MACHINE LEARNING Regression: Issues ∈ ∈   N Map N-dim. input to a continuous output . x y Learn a function of the type: ( ) → =   Fit strongly influenced by choice of: N : and . f y f x - datapoints for training - complexity of the model (interpolation) y 3 y True function 2 y Estimate 4 y 1 y x 1 2 3 4 x x x x { } i i Estimate that best predict set of training points , ? f x y = 1,... i M 3 3

ADVANCED MACHINE LEARNING Regression Algorithms in this Course Support Vector Machine Relevance Vector Machine Relevance vector regression Support vector regression Boosting – random projections Boosting – random gaussians Random forest Gradient boosting Gaussian process regression Gaussian Process Not covered in class!! Locally weighted projected regression 4 4

ADVANCED MACHINE LEARNING Regression Algorithms in this Course Support Vector Machine Relevance Vector Machine Relevance Vector Machine Relevance vector regression Support vector regression Boosting – random projections 5 5

ADVANCED MACHINE LEARNING Support Vector Regression 6 6 6

ADVANCED MACHINE LEARNING Support Vector Regression ( ) = Assume a nonlinear mapping , s.t. . f y f x { } i i How to estimate to best predict the pair of training points , ? f x y = 1,... i M How to generalize the support vector machine framework for classification to estimate continuous functions? 1. Assume a non-linear mapping through feature space and then perform linear regression in feature space 2. Supervised learning – minimizes an error function.  First determine a way to measure error on testing set in the linear case! 7 7 7

ADVANCED MACHINE LEARNING Support Vector Regression b is estimated as in SVR ( ) through least-square = = + T Assume a linear mapping , s.t. . f y f x w x b regression on support vectors; hence we omit it from the rest of the developments . { } 1,... i i How to estimate and to best predict the pair of training points , ? w b x y = i M  y ( ) = = + T y f x w x b Measure the error on prediction x 8 8 8

ADVANCED MACHINE LEARNING Support Vector Regression ε Set an upper bound on the error and consider as correctly classified all points − ≤ ε such that ( ) , f x y  y ( ) = = + T y f x w x b Penalize only datapoints that are ε not contained in the -tube. x 9 9 9

ADVANCED MACHINE LEARNING Support Vector Regression The ε -margin is a measure of the width of the ε - insensitive tube. It is a measure of the precision of the regression. A small || w || corresponds to a small slope for f . In the linear case, f is more horizontal. y x ε -margin 10 10 10

ADVANCED MACHINE LEARNING Support Vector Regression A large || w || corresponds to a large slope for f . In the linear case, f is more vertical. The flatter the slope of the function f, the larger the ε− margin. y  To maximize the margin, we must minimize the norm of w. x ε -margin 11 11 11

ADVANCED MACHINE LEARNING Support Vector Regression This can be rephrased as a constraint-based optimization problem of the form: Need to penalize points outside 1 the ε -insensitive 2 minimize w tube. 2  + − ≤ ε i , w x b y  i  subject to − − ≤ ε  i , y w x b i  ∀ = 1,... i M 12 12 12

ADVANCED MACHINE LEARNING Support Vector Regression ξ ξ ≥ * Introduce slack variables , , 0 : C i i Need to penalize points outside the ε -insensitive ( ) 1 C M ∑ 2 ξ + ξ * minimize + w tube. i i 2 M = i 1  + − ≤ ε + ξ i , w x b y i  i ξ  i − − ≤ ε + ξ i *  subject to , ξ y w x b i * i  i ξ ≥ ξ ≥ *  0, 0  i i 13 13 13

ADVANCED MACHINE LEARNING Support Vector Regression ξ ξ ≥ * Introduce slack variables , , 0 : C i i All points outside the ε -tube become ( ) 1 C M ∑ 2 ξ + ξ Support Vectors * minimize + w i i 2 M = i 1  + − ≤ ε + ξ i , w x b y i  i ξ  i − − ≤ ε + ξ i *  subject to , ξ y w x b i * i  i ξ ≥ ξ ≥ *  0, 0  i i We now have the solution to the linear regression problem. How to generalize this to the nonlinear case? 14 14 14

ADVANCED MACHINE LEARNING Support Vector Regression Lift x into feature space and then perform linear regression in feature space. Linear Case: ( ) = = + , y f x w x b ( ) → φ Non-Linear Case: x x ( ) → φ x x ( ) ( ) ( ) = φ = φ + , y f x w x b w lives in feature space! 15 15 15

ADVANCED MACHINE LEARNING Support Vector Regression In feature space, we obtain the same constrained optimization problem: ( ) 1 C M ∑ 2 ξ + ξ * minimize + w i i 2 M = 1 i ( )  φ + − ≤ ε + ξ i , w x b y i  i  ( ) − φ − ≤ ε + ξ i *  subject to , y w x b i i  ξ ≥ ξ ≥  * 0, 0  i i 16 16 16

ADVANCED MACHINE LEARNING Support Vector Regression Again, we can solve this quadratic problem by introducing sets of Lagrange multipliers and writing the Lagrangian : Lagrangian = Objective function + λ * constraints ( ) ( ) M M 1 C C ∑ ∑ ( ) ξ ξ 2 ξ + ξ − η ξ + η ξ * * * L , , *, = + w b w i i i i i i 2 M M = = i 1 i 1 ( ) ( ) M ∑ − α ε + ξ + − φ − i , y w x b i i i = 1 i ( ) ( ) M ∑ − α ε + ξ − + φ + * * i , y w x b i i i = 1 i 17 17 17

ADVANCED MACHINE LEARNING Support Vector Regression α = α = * 0 for all points that do not satisfy the constraints i i → ε points outside the -tube α > 0 i ξ i ξ * i α > * 0 i Constraints on points lying on either side of the ε -tube ( ) ( ) M M 1 C C ∑ ∑ ( ) ξ ξ 2 ξ + ξ − η ξ + η ξ * * * L , , *, = + w b w i i i i i i 2 M M = = i 1 i 1 ( ) ( ) M ∑ − α ε + ξ + − φ − i , y w x b i i i = 1 i ( ) ( ) M ∑ − α ε + ξ − + φ + * * i , y w x b i i i = 1 i 18 18 18

ADVANCED MACHINE LEARNING Support Vector Regression α = α = * 0 for all points that do not satisfy the constraints i i → ε points outside the -tube α > 0 i α > * 0 i Requiring that the partial derivatives are all zero: ∂ ( ) M M M L ∑ ∑ Rebalancing the effect of the support vectors on ∑ → α = α = α − α = * * 0. both sides of the ε -tube ∂ i i i i b = = = 1 1 1 i i i ∂ ( ) ( ) L M ∑ ( ) ( ) Linear combination of support M = − α − α φ = ∑ * i 0. w x ⇒ = α − α φ * i . vectors w x ∂ i i w i i = 1 i = 1 i 19 19 19

ADVANCED MACHINE LEARNING Support Vector Regression And replacing in the primal Lagrangian, we get the Dual optimization problem: − ( ) ( ) ( ) Kernel Trick 1 M ∑ α − α α − α ⋅ * * i j ,  k x x ( ) ( ) ( ) i i j j  2 = φ φ i j i j = , , k x x x x , 1 i j  max  ( ) ( ) M M  − ∑ ∑ α α ε α + α + α + α * , * i * y  i i i i  = = i 1 i 1   ( ) M C ∑ α − α = α α ∈  * * i subject to 0 and , 0,  i i i i   M = 1 i 20 20 20

ADVANCED MACHINE LEARNING Support Vector Regression The solution is given by : ( ) ( ) M ( ) ∑ = = α − α + * i , y f x k x x b i i = 1 i Linear Coefficients If one uses RBF Kernel, (Lagrange multipliers M un-normalized isotropic for each constraint). Gaussians centered on each training datapoint. 21 21 21

ADVANCED MACHINE LEARNING Support Vector Regression The solution is given by : ( ) ( ) M ( ) ∑ = = α − α + * i , y f x k x x b i i = 1 i Kernel places a Gauss function on each SV y x 22 22 22

ADVANCED MACHINE LEARNING Support Vector Regression Converges to b when The solution is given by : SV effect vanishes. ( ) ( ) M ∑ ( ) = = α − α + * i , y f x k x x b i i = 1 i The Lagrange multipliers y define the importance of each Gaussian function. Y=f(x) b x x x x x x x 1 2 3 4 5 6 α = α = α = α = α = α = * * * 1.5 2 2.5 1.5 3 1 3 1 2 4 6 5 23 23 23

ADVANCED MACHINE LEARNING Non-linear regression techniques (SVR and - PowerPoint PPT Presentation

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING Non-linear regression techniques (SVR and extensions, GPR, Gradient Boosting) 1 1 ADVANCED MACHINE LEARNING Regression: Principle N Map N-dim. input to a continuous

ADVANCED MACHINE LEARNING Non-linear regression techniques 1 1 ADVANCED MACHINE LEARNING

Kernel Methods for Regression Support Vector Regression Gaussian Mixture Regression Gaussian

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Linear regression Linear regression is a simple approach to supervised learning. It assumes

Regression 1: Linear Regression Marco Baroni Practical Statistics in R Outline Classic linear

Regression: Simple and Linear Introduction to Machine Learning Regression Principle REGRESSION

LINEAR REGRESSION LINEAR REGRESSION - FROM A MACHINE LEARNING POINT OF VIEW 25 SIMPLE LINEAR

Notes on the Non-linear Regression The model Non-linear regression models, like ordinary linear

Regression Methods 1. Linear Regression and Logistic Regression: definitions, and a common

Machine Learning 1: Linear Regression Stefano Ermon March 31, 2016 Stefano Ermon March 31, 2016

Linear regression How to measure the accuracy of linear regression models Linear Regression

Linear Models for Regression Greg Mori - CMPT 419/726 Bishop PRML Ch. 3 Regression Linear Basis

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Linear regression Linear regression is a simple approach to supervised learning. It assumes

MACHINE LEARNING Linear and Weighted Regression Support Vector Regression 1 APPLIED MACHINE

Integrated solutions on providing the consumer proper- ties of ash and improvement of

Geothermal Energy Pilot Project Unterhaching, Germany Strasbourg, 15 September 2006 Presentation

Macro View on the Aviation Industry Dublin January 2017 Peter Morris Chief Economist 1

Relaxed Persist Ordering Using Strand Persistency Vaibhav Gogte, William Wang $ , Stephan

QR Oriented Orthant. C u x (Torres et al. 2015) D EFINITION Given x R d and u z

Long-Term Financial Risks: the One-Dimensional Case Roger Kaufmann, RiskLab, ETH Z urich

Efficient estimation for ergodic diffusions sampled at high frequency Michael Srensen

The SIML Estimation of Integrated Covariances and Hedging Coefficients under Micro-market Noise,