lecture 6
play

Lecture 6 Jan-Willem van de Meent Regression Curve Fitting - PowerPoint PPT Presentation

Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 6 Jan-Willem van de Meent Regression Curve Fitting (according to XKCD) https://xkcd.com/2048/ Linear Regression Goal: Approximate points with a line or


  1. Unsupervised Machine Learning 
 and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 6 Jan-Willem van de Meent

  2. Regression

  3. Curve Fitting (according to XKCD) https://xkcd.com/2048/

  4. Linear Regression Goal: Approximate points with a line or hyper-surface

  5. Linear Regression Assume f is a linear combination of D features ε ∼ Norm ( 0, σ 2 ) For N points we write Learning : Estimate w Prediction : Estimate y’ given x’

  6. Error Measure: Sum of Squares Mean Squared Error (MSE): N E ( w ) = 1 X ( w T x n � y n ) 2 N n =1 = 1 N k Xw � y k 2 where — x 1 T — 2 3 2 y 1 T 3 — x 2 T — y 2 T 6 7 6 7 X = y = 6 7 6 7 4 5 4 5 . . . . . . — x NT — y NT

  7. Minimizing the Error E ( w ) = 1 N k Xw � y k 2 5 E ( w ) = 2 N X T ( Xw � y ) = 0 2 X T Xw = X T y w = X † y where X † = ( X T X ) � 1 X T is the ’pseudo-inverse’ of X

  8. Minimizing the Error E ( w ) = 1 N k Xw � y k 2 5 E ( w ) = 2 N X T ( Xw � y ) = 0 2 X T Xw = X T y w = X † y where X † = ( X T X ) � 1 X T is the ’pseudo-inverse’ of X Matrix Cookbook (on course website)

  9. Ordinary Least Squares Construct matrix X and the vector y from the dataset { ( x 1 , y 1 ) , x 2 , y 2 ) , . . . , ( x N , y N ) } (each x includes x 0 = 1) as follows:  — x T   y T  1 — 1 — x T y T 2 —     2 X = y =         . . . . . . — x T y T N — N Compute X † = ( X T X ) − 1 X T Return w = X † y

  10. Basis function regression Linear regression Basis function regression For N samples Polynomial regression

  11. Polynomial Regression M = 0 M = 1 1 1 t t 0 0 − 1 − 1 0 1 0 1 x x M = 3 M = 9 1 1 t t 0 0 − 1 − 1 0 1 0 1 x x

  12. Polynomial Regression Underfit M = 0 M = 1 1 1 t t 0 0 − 1 − 1 0 1 0 1 x x M = 3 M = 9 1 1 t t 0 0 − 1 − 1 0 1 0 1 x x

  13. Polynomial Regression M = 0 M = 1 1 1 t t 0 0 − 1 − 1 0 1 0 1 x x Overfit M = 3 M = 9 1 1 t t 0 0 − 1 − 1 0 1 0 1 x x

  14. Regularization L 2 regularization (ridge regression) minimizes: E ( w ) = 1 N k Xw � y k 2 + λ k w k 2 where λ � 0 and k w k 2 = w T w � k k L 1 regularization (LASSO) minimizes: E ( w ) = 1 N k Xw � y k 2 + λ | w | 1 D where λ � 0 and | w | 1 = P | ω i | i =1

  15. Regularization

  16. Regularization L 2: closed form solution w = ( X T X + λ I ) � 1 X T y L 1: No closed form solution. Use quadratic programming: minimize k Xw � y k 2 k w k 1  s s . t .

  17. Maximum Likelihood

  18. Regression: Probabilistic Interpretation ? What is the probability

  19. Regression: Probabilistic Interpretation Least Squares 
 Objective Likelihood

  20. Maximum Likelihood Least Squares 
 Objective Log-Likelihood Maximizing the likelihood minimizes the sum of squares

  21. Maximum a Posteriori

  22. Regression with Priors Can we maximize ? (i.e. can we perform MAP estimation?)

  23. Regression with Priors From Bayes Rule

  24. Maximum a Posteriori Maximum a Posteriori is Equivalent to Ridge Regression

Recommend


More recommend