A motivation for polynomial regression We have obtained input-output pairs { ( x t , y t ) } t over the last 200 time steps and aim to model their relationship 3.0 3.0 3.0 ● ● ● ● ● ● 2.5 ● ● 2.5 2.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2.0 ● ● ● 2.0 2.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1.5 ● 1.5 1.5 ● y ● ● y ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1.0 ● 1.0 1.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.5 ● 0.5 0.5 ● ● ● ● ● ● ● ● ● 0.0 0.0 0.0 0 1 2 3 4 5 0 0 1 1 2 2 3 3 4 4 5 5 x x Using linear regression does not look like such a good idea... 2/14
Linear regression A simple linear relation is assumed between x and y , i.e., y t = β 0 + β 1 x t + ε t , t = t n − n , . . . , t n where β 0 and β 1 are the model parameters (called intercept and slope ) ε t is a noise term, which you may see as our forecast error we want to minimize The linear regression model can be reformulated in a more compact form as y t = β ⊤ x t + ε t , t = t n − n , . . . , t n with � � � 1 � β 0 β = , x t = β 1 x t 3/14
Least Squares (LS) estimation Now we need to find the best value of β that describes this cloud of point Under a number of assumptions, which we overlook here, the (best) model parameters ˆ β can be readily obtained with Least-Squares (LS) estimation The Least-Squares (LS) estimate ˆ β of the linear regression model parameters is given by � 2 t ε i 2 = arg min β � ˆ y t − β ⊤ x t = ( X ⊤ X ) − 1 X ⊤ y β = arg min β � � t with 1 x t n − n y t n − n � ˆ 1 x t n − n +1 y t n − n +1 � β 0 ˆ β = , X = , y = . . . ˆ . . . β 1 . . . 1 x t n y t n 4/14
Extending to polynomial regression We could also assume more generally a polynomial relation between x and y , i.e., y t = β 0 + � P p =1 β p x p t + ε t , t = t n − n , . . . , t n where β p , p = 0 , . . . , P are the model parameters ε t is a noise term, which you may see as our forecast error we want to minimize This polynomial regression can be reformulated in a more compact form as y t = β ⊤ x t + ε t , i = t n − n , . . . , t n with β 0 1 β 1 x t β = , x t = · · · · · · x P β P t 5/14
Least Squares (LS) estimation As the model is linear we can still use LS estimation! The Least-Squares (LS) estimate ˆ β of the linear regression model parameters is given by � 2 t ε t 2 = arg min β � ˆ y t − β ⊤ x t = ( X ⊤ X ) − 1 X ⊤ y � � β = arg min β t with x 2 x P ˆ 1 x t n − n . . . y t n − n β 0 t n − n t n − n x 2 x P 1 x t n − n +1 . . . y t n − n +1 ˆ β 1 t n − n +1 t n − n +1 ˆ β = , X = , y = . . . . . . . . . . · · · . . . . . ˆ β P x 2 x P 1 x t n . . . y t n t n t n 6/14
Going back to our example We apply polynominal regression with P = 2 (quadratic) and P = 3 (cubic) 3.0 3.0 3.0 3.0 ● ● ● ● ● ● ● ● ● ● 2.5 2.5 ● 2.5 2.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 2.0 2.0 ● ● 2.0 2.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1.5 1.5 ● 1.5 1.5 ● ● ● ● ● y ● y ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1.0 1.0 ● 1.0 1.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.5 0.5 ● 0.5 0.5 ● ● ● ● ● ● ● ● ● 0.0 0.0 0.0 0.0 0 0 1 1 2 2 3 3 4 4 5 5 0 0 1 1 2 2 3 3 4 4 5 5 x x They both look quite nicer than the simple linear fit We are lucky here that the relationship truly is quadratic... if fitting higher-order polynominals, ˆ β i = 0 , p > 2 In general, higher-order may yield spurious results(!) 7/14
Recommend
More recommend