linear regression
play

Linear regression Petr Po s k P. Po s k c 2015 Artificial - PowerPoint PPT Presentation

CZECH TECHNICAL UNIVERSITY IN PRAGUE Faculty of Electrical Engineering Department of Cybernetics Linear regression Petr Po s k P. Po s k c 2015 Artificial Intelligence 1 / 9 Linear regression P. Po s k c


  1. CZECH TECHNICAL UNIVERSITY IN PRAGUE Faculty of Electrical Engineering Department of Cybernetics Linear regression Petr Poˇ s´ ık P. Poˇ s´ ık c � 2015 Artificial Intelligence – 1 / 9

  2. Linear regression P. Poˇ s´ ık c � 2015 Artificial Intelligence – 2 / 9

  3. Linear regression Regression task is a supervised learning task, i.e. ■ a training (multi)set T = { ( x ( 1 ) , y ( 1 ) ) , . . . , ( x ( | T | ) , y ( | T | ) ) } is available, where Linear regression ■ the labels y ( i ) are quantitave , often continuous (as opposed to classification tasks • Regression where y ( i ) are nominal). • Notation remarks • Train, apply ■ Its purpose is to model the relationship between independent variables (inputs) • 1D regression x = ( x 1 , . . . , x D ) and the dependent variable (output) y . • LSM • Minimizing J ( w , T ) • Multivariate linear regression P. Poˇ s´ ık c � 2015 Artificial Intelligence – 3 / 9

  4. Linear regression Regression task is a supervised learning task, i.e. ■ a training (multi)set T = { ( x ( 1 ) , y ( 1 ) ) , . . . , ( x ( | T | ) , y ( | T | ) ) } is available, where Linear regression ■ the labels y ( i ) are quantitave , often continuous (as opposed to classification tasks • Regression where y ( i ) are nominal). • Notation remarks • Train, apply ■ Its purpose is to model the relationship between independent variables (inputs) • 1D regression x = ( x 1 , . . . , x D ) and the dependent variable (output) y . • LSM • Minimizing J ( w , T ) • Multivariate linear regression Linear regression is a particular regression model which assumes (and learns) linear relationship between the inputs and the output: y = h ( x ) = w 0 + w 1 x 1 + . . . + w D x D = w 0 + � w , x � = w 0 + xw T , � where � ■ y is the model prediction ( estimate of the true value y ), h ( x ) is the linear model (a hypothesis ), ■ w 0 , . . . , w D are the coefficients of the linear function, w 0 is the bias , organized in a row ■ vector w , ■ � w , x � is a dot product of vectors w and x (scalar product), ■ which can be also computed as a matrix product xw T if w and x are row vectors. P. Poˇ s´ ık c � 2015 Artificial Intelligence – 3 / 9

  5. Notation remarks Homogeneous coordinates : If we add “1” as the first element of x so that x = ( 1, x 1 , . . . , x D ) , then we can write the linear model in an even simpler form (without the explicit bias term): Linear regression • Regression y = h ( x ) = w 0 · 1 + w 1 x 1 + . . . + w D x D = � w , x � = xw T . � • Notation remarks • Train, apply • 1D regression • LSM • Minimizing J ( w , T ) • Multivariate linear regression P. Poˇ s´ ık c � 2015 Artificial Intelligence – 4 / 9

  6. Notation remarks Homogeneous coordinates : If we add “1” as the first element of x so that x = ( 1, x 1 , . . . , x D ) , then we can write the linear model in an even simpler form (without the explicit bias term): Linear regression • Regression y = h ( x ) = w 0 · 1 + w 1 x 1 + . . . + w D x D = � w , x � = xw T . � • Notation remarks • Train, apply • 1D regression • LSM Matrix notation: If we organize the data into matrix X and vector y , such that • Minimizing J ( w , T ) • Multivariate linear     regression x ( 1 ) y ( 1 ) 1     . . .     X = y = . . and .  ,    . . . x ( | T | ) y ( | T | ) 1 and similarly with � y , then we can write a batch computation of predictions for all data in X as y = Xw T . � P. Poˇ s´ ık c � 2015 Artificial Intelligence – 4 / 9

  7. Two operation modes Any ML model has 2 operation modes: 1. learning (training, fitting) and Linear regression 2. application (testing, making predictions). • Regression • Notation remarks • Train, apply • 1D regression • LSM • Minimizing J ( w , T ) • Multivariate linear regression P. Poˇ s´ ık c � 2015 Artificial Intelligence – 5 / 9

  8. Two operation modes Any ML model has 2 operation modes: 1. learning (training, fitting) and Linear regression 2. application (testing, making predictions). • Regression • Notation remarks • Train, apply • 1D regression • LSM The model h can be viewed as a function of 2 variables: h ( x , w ) . • Minimizing J ( w , T ) • Multivariate linear regression P. Poˇ s´ ık c � 2015 Artificial Intelligence – 5 / 9

  9. Two operation modes Any ML model has 2 operation modes: 1. learning (training, fitting) and Linear regression 2. application (testing, making predictions). • Regression • Notation remarks • Train, apply • 1D regression • LSM The model h can be viewed as a function of 2 variables: h ( x , w ) . • Minimizing J ( w , T ) • Multivariate linear regression Model application: If the model is given ( w is fixed), we can manipulate x to make predictions: y = h ( x , w ) = h w ( x ) . � P. Poˇ s´ ık c � 2015 Artificial Intelligence – 5 / 9

  10. Two operation modes Any ML model has 2 operation modes: 1. learning (training, fitting) and Linear regression 2. application (testing, making predictions). • Regression • Notation remarks • Train, apply • 1D regression • LSM The model h can be viewed as a function of 2 variables: h ( x , w ) . • Minimizing J ( w , T ) • Multivariate linear regression Model application: If the model is given ( w is fixed), we can manipulate x to make predictions: y = h ( x , w ) = h w ( x ) . � Model learning: If the data is given ( T is fixed), we can manipulate the model parameters w to fit the model to the data: w ∗ = argmin J ( w , T ) . w P. Poˇ s´ ık c � 2015 Artificial Intelligence – 5 / 9

  11. Two operation modes Any ML model has 2 operation modes: 1. learning (training, fitting) and Linear regression 2. application (testing, making predictions). • Regression • Notation remarks • Train, apply • 1D regression • LSM The model h can be viewed as a function of 2 variables: h ( x , w ) . • Minimizing J ( w , T ) • Multivariate linear regression Model application: If the model is given ( w is fixed), we can manipulate x to make predictions: y = h ( x , w ) = h w ( x ) . � Model learning: If the data is given ( T is fixed), we can manipulate the model parameters w to fit the model to the data: w ∗ = argmin J ( w , T ) . w How to train the model? P. Poˇ s´ ık c � 2015 Artificial Intelligence – 5 / 9

  12. Simple (univariate) linear regression Simple (univariate) regression deals with cases where x ( i ) = x ( i ) , i.e. the examples are described by a single feature (they are 1-dimensional). Linear regression • Regression • Notation remarks • Train, apply • 1D regression • LSM • Minimizing J ( w , T ) • Multivariate linear regression P. Poˇ s´ ık c � 2015 Artificial Intelligence – 6 / 9

  13. Simple (univariate) linear regression Simple (univariate) regression deals with cases where x ( i ) = x ( i ) , i.e. the examples are described by a single feature (they are 1-dimensional). Linear regression Fitting a line to data: • Regression • Notation remarks y = w 0 + w 1 x ■ find parameters w 0 , w 1 of a linear model ˆ • Train, apply • 1D regression ■ given a traning (multi)set T = { ( x ( i ) , y ( i ) ) } | T | i = 1 . • LSM • Minimizing J ( w , T ) • Multivariate linear regression P. Poˇ s´ ık c � 2015 Artificial Intelligence – 6 / 9

  14. Simple (univariate) linear regression Simple (univariate) regression deals with cases where x ( i ) = x ( i ) , i.e. the examples are described by a single feature (they are 1-dimensional). Linear regression Fitting a line to data: • Regression • Notation remarks y = w 0 + w 1 x ■ find parameters w 0 , w 1 of a linear model ˆ • Train, apply • 1D regression ■ given a traning (multi)set T = { ( x ( i ) , y ( i ) ) } | T | i = 1 . • LSM • Minimizing J ( w , T ) • Multivariate linear regression How to fit depending on the number of training examples: ■ Given a single example (1 equation, 2 parameters) ⇒ infinitely many linear function can be fitted. ■ Given 2 examples (2 equations, 2 parameters) ⇒ exactly 1 linear function can be fitted. ■ Given 3 or more examples ( > 2 equations, 2 parameters) ⇒ no line can be fitted without any error ⇒ a line which minimizes the “size” of error y − � y can be fitted: w ∗ = ( w ∗ 0 , w ∗ 1 ) = argmin J ( w 0 , w 1 , T ) . w 0 , w 1 P. Poˇ s´ ık c � 2015 Artificial Intelligence – 6 / 9

  15. The least squares method The least squares method (LSM) suggests to choose such parameters w which minimize the mean squared error Linear regression � y ( i ) � 2 � � 2 | T | | T | 1 1 • Regression y ( i ) − � y ( i ) − h w ( x ( i ) ) ∑ ∑ J ( w ) = = . • Notation remarks | T | | T | i = 1 i = 1 • Train, apply • 1D regression y • LSM • Minimizing J ( w , T ) y = w 0 + w 1 x � ( x ( 3 ) , � y ( 3 ) ) ( x ( 2 ) , y ( 2 ) ) • Multivariate linear regression | y ( 3 ) − � y ( 3 ) | | y ( 2 ) − � y ( 2 ) | ( x ( 3 ) , y ( 3 ) ) ( x ( 2 ) , � y ( 2 ) ) w 1 ( x ( 1 ) , � y ( 1 ) ) 1 | y ( 1 ) − � y ( 1 ) | w 0 ( x ( 1 ) , y ( 1 ) ) x 0 P. Poˇ s´ ık c � 2015 Artificial Intelligence – 7 / 9

Recommend


More recommend