linear regression
play

Linear regression DS GA 1002 Statistical and Mathematical Models - PowerPoint PPT Presentation

Linear regression DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/DSGA1002_fall15 Carlos Fernandez-Granda Linear models Least-squares estimation Overfitting Example: Global warming Regression The aim is


  1. Linear regression DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/DSGA1002_fall15 Carlos Fernandez-Granda

  2. Linear models Least-squares estimation Overfitting Example: Global warming

  3. Regression The aim is to learn a function h that relates ◮ a response or dependent variable y ◮ to several observed variables x 1 , x 2 , . . . , x p , known as covariates, features or independent variables The response is assumed to be of the form y = h ( � x ) + z x ∈ R p contains the features and z is noise where �

  4. Linear regression The regression function h is assumed to be linear y ( i ) = � β ∗ + z ( i ) , x ( i ) T � 1 ≤ i ≤ n β ∗ ∈ R p from the data Our aim is to estimate �

  5. Linear regression In matrix form x ( 1 ) x ( 1 ) x ( 1 )     � y ( 1 ) z ( 1 )   � � · · · � β ∗   p 1 2 1 x ( 2 ) x ( 2 ) x ( 2 ) � y ( 2 ) z ( 2 ) · · · β ∗  � � �        p 2  = 1 2  +         · · · · · · · · ·  · · · · · · · · · · · ·          y ( n ) z ( n ) x ( n ) x ( n ) x ( n ) � β ∗ � � · · · � p p 1 2 Equivalently, β ∗ + � y = X � � z

  6. Linear model for GDP Population Unemployment GDP rate (%) (USD millions) California 38 332 521 5.5 2 448 467   Minnesota 5 420 380 4.0 334 780   Oregon 3 930 065 5.5 228 120     Nevada 2 790 136 5.8 141 204     Idaho 1 612 136 3.8 65 202     Alaska 735 132 6.9 54 256   South Carolina 4 774 839 4.9 ???

  7. Linear model for GDP After normalizing the features and the response  0 . 984   0 . 982 0 . 419  0 . 135 0 . 139 0 . 305         0 . 092 0 . 101 0 . 419     y := X := � ,     0 . 057 0 . 071 0 . 442         0 . 026 0 . 041 0 . 290     0 . 022 0 . 019 0 . 526 β ∈ R 2 such that � Aim: find � y ≈ X � β sc � x T The estimate for the GDP of South Carolina will be � β

  8. Linear models Least-squares estimation Overfitting Example: Global warming

  9. Least squares For fixed � β we can evaluate the error using n � 2 2 � y ( i ) − � � � � � x ( i ) T � � y − X � β = � � β � � � � � � � 2 i = 1 The least-squares estimate � β LS minimizes this cost function � � � � � y − X � β LS := arg min � � β � � � � � � � � 2 β

  10. Least-squares fit 1.2 Data Least-squares fit 1.0 0.8 0.6 y 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 1.2 x

  11. Linear model for GDP The least-squares estimate is � 1 . 010 � � β LS = − 0 . 019 GDP roughly proportional to the population Unemployment doesn’t help (linearly)

  12. Linear model for GDP GDP Estimate California 2 448 467 2 446 186   Minnesota 334 780 334 584   Oregon 228 120 233 460     Nevada 141 204 159 088     Idaho 65 202 90 345     Alaska 54 256 23 050   South Carolina 199 256 289 903

  13. Geometric interpretation ◮ Any vector X � β is in the span of the columns of X ◮ The least-squares estimate is the closest vector to � y that can be represented in this way ◮ This is the projection of � y onto the column space of X

  14. Geometric interpretation

  15. Probabilistic interpretation We model the noise as an iid Gaussian random vector � Z Entries have zero mean and variance σ 2 The data are a realization of the random vector Y := X � � β + � Z Y is Gaussian with mean X � � β and covariance matrix σ 2 I

  16. Likelihood The joint pdf of � Y is n � � 2 � 1 − 1 � � � � X � Y ( � a ) := √ exp � a i − f � β 2 σ 2 2 πσ i i = 1 1 � − 1 2 � � � � � a − X � = ( 2 π ) n σ n exp � � β � � � � 2 σ 2 � � � � 2 The likelihood is � � 1 − 1 2 � � � � � � � y − X � L � = ( 2 π ) n exp � � β β � � � � y � 2 � � � 2

  17. Maximum-likelihood estimate The maximum-likelihood estimate is � � � � β ML = arg max L � β y � β � � � = arg max log L � β y � β 2 � � � � y − X � = arg min � � β � � � � � � � � 2 β = � β LS

  18. Linear models Least-squares estimation Overfitting Example: Global warming

  19. Temperature predictor A friend tells you: I found a cool way to predict the temperature in New York: It’s just a linear combination of the temperature in every other state. I fit the model on data from the last month and a half and it’s perfect!

  20. Overfitting If a model is very complex, it may overfit the data To evaluate a model we separate the data into a training and a test set 1. We fit the model using the training set 2. We evaluate the error on the test set

  21. Experiment X train , X test , � z train and β are iid Gaussian with mean 0 and variance 1 β ∗ + � y train = X train � � z train y test = X test � � β ∗ y train and X train to compute � We use � β LS � � � � � X train � β LS − � y train � � � � � � � 2 error train = || � y train || 2 � � � � � X test � β LS − � y test � � � � � � � 2 error test = || � y test || 2

  22. Experiment 0.5 Error (training) Error (test) Noise level (training) 0.4 Relative error (l2 norm) 0.3 0.2 0.1 0.0 50 100 200 300 400 500 n

  23. Linear models Least-squares estimation Overfitting Example: Global warming

  24. Maximum temperatures in Oxford, UK 30 25 20 Temperature (Celsius) 15 10 5 0 1860 1880 1900 1920 1940 1960 1980 2000

  25. Maximum temperatures in Oxford, UK 25 20 Temperature (Celsius) 15 10 5 0 1900 1901 1902 1903 1904 1905

  26. Linear model � 2 π t � � 2 π t � y t ≈ � β 0 + � + � + � � β 1 cos β 2 sin β 3 t 12 12 1 ≤ t ≤ n is the time in months ( n = 12 · 150)

  27. Model fitted by least squares 30 25 20 Temperature (Celsius) 15 10 5 0 Data Model 1860 1880 1900 1920 1940 1960 1980 2000

  28. Model fitted by least squares 25 20 Temperature (Celsius) 15 10 5 Data Model 0 1900 1901 1902 1903 1904 1905

  29. Model fitted by least squares 25 20 Temperature (Celsius) 15 10 5 0 Data Model 5 1960 1961 1962 1963 1964 1965

  30. Trend: Increase of 0.75 ◦ C / 100 years (1.35 ◦ F) 30 25 20 Temperature (Celsius) 15 10 5 0 Data Trend 1860 1880 1900 1920 1940 1960 1980 2000

  31. Model for minimum temperatures 20 15 Temperature (Celsius) 10 5 0 5 Data Model 10 1860 1880 1900 1920 1940 1960 1980 2000

  32. Model for minimum temperatures 14 12 10 Temperature (Celsius) 8 6 4 2 0 Data Model 2 1900 1901 1902 1903 1904 1905

  33. Model for minimum temperatures 15 10 Temperature (Celsius) 5 0 5 Data Model 10 1960 1961 1962 1963 1964 1965

  34. Trend: Increase of 0.88 ◦ C / 100 years (1.58 ◦ F) 20 15 Temperature (Celsius) 10 5 0 5 Data Trend 10 1860 1880 1900 1920 1940 1960 1980 2000

Recommend


More recommend