linear regression
play

Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept - PowerPoint PPT Presentation

Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010 Discrete to Continuous Labels Classification Sports Anemic cell Science Healthy cell News Y = Diagnosis X = Document Y = Topic X = Cell Image Regression Stock


  1. Linear Regression Aarti Singh Machine Learning 10-701/15-781 Sept 27, 2010

  2. Discrete to Continuous Labels Classification Sports Anemic cell Science Healthy cell News Y = Diagnosis X = Document Y = Topic X = Cell Image Regression Stock Market Prediction Y = ? X = Feb01 2

  3. Regression Tasks Weather Prediction Y = Temp X = 7 pm Estimating Contamination X = new location Y = sensor reading 3

  4. Supervised Learning Goal: Sports Science Y = ? News X = Feb01 Classification: Regression: Probability of Error Mean Squared Error 4

  5. Regression Optimal predictor: (Conditional Mean) Intuition: Signal plus (zero-mean) Noise model 5

  6. Regression Optimal predictor: Proof Strategy: Dropping subscripts for notational convenience 6 ≥ 0

  7. Regression Optimal predictor: (Conditional Mean) Intuition: Signal plus (zero-mean) Noise model Depends on unknown distribution 7

  8. Regression algorithms Learning algorithm Linear Regression Lasso, Ridge regression (Regularized Linear Regression) Nonlinear Regression Kernel Regression Regression Trees, Splines , Wavelet estimators, … 8

  9. Empirical Risk Minimization (ERM) Optimal predictor: Empirical Risk Minimizer: Class of predictors Empirical mean Law of Large Numbers More later… 9

  10. ERM – you saw it before! • Learning Distributions Max likelihood = Min -ve log likelihood empirical risk What is the class F ? Class of parametric distributions Bernoulli ( q ) Gaussian ( m , s 2 ) 10

  11. Linear Regression Least Squares Estimator - Class of Linear functions b 2 = slope Uni-variate case: b 1 - intercept Multi-variate case: 1 where , 11

  12. Least Squares Estimator 12

  13. Least Squares Estimator 13

  14. Normal Equations p xp p x1 p x1 If is invertible, When is invertible ? Recall: Full rank matrices are invertible. What is rank of ? What if is not invertible ? Regularization (later) 14

  15. Geometric Interpretation Difference in prediction on training set: 0 is the orthogonal projection of onto the linear subspace spanned by the columns of 15

  16. Revisiting Gradient Descent Even when is invertible, might be computationally expensive if A is huge. Gradient Descent since J( b ) is convex Initialize: Update: 0 if = Stop: when some criterion met e.g. fixed # iterations, or < ε . 16

  17. Effect of step-size α Large α => Fast convergence but larger residual error Also possible oscillations Small α => Slow convergence but small residual error 17

  18. Least Squares and MLE Intuition: Signal plus (zero-mean) Noise model log likelihood Least Square Estimate is same as Maximum Likelihood Estimate under a Gaussian model ! 19

  19. Regularized Least Squares and MAP What if is not invertible ? log likelihood log prior I) Gaussian Prior 0 Ridge Regression Closed form: HW Prior belief that β is Gaussian with zero- mean biases solution to “small” β 20

  20. Regularized Least Squares and MAP What if is not invertible ? log likelihood log prior II) Laplace Prior Lasso Prior belief that β is Laplace with zero- mean biases solution to “small” β 21

  21. Ridge Regression vs Lasso Ridge Regression: Lasso: Ideally l0 penalty, HOT! but optimization becomes non-convex β s with constant J ( β ) (level sets of J ( β )) β s with β s with β 2 β s with constant constant constant l2 norm l1 norm l0 norm β 1 Lasso (l1 penalty) results in sparse solutions – vector with more zero coordinates Good for high-dimensional problems – don’t have to store all coordinates! 22

  22. Beyond Linear Regression Polynomial regression Regression with nonlinear features/basis functions h Kernel regression - Local/Weighted regression Regression trees – Spatially adaptive regression 23

  23. Polynomial Regression Univariate (1-d) case: where , Weight of Nonlinear each feature features 24

  24. Polynomial Regression http://mste.illinois.edu/users/exner/java.f/leastsquares/ 25

  25. Nonlinear Regression Basis coefficients Nonlinear features/basis functions Fourier Basis Wavelet Basis Good representation for oscillatory functions Good representation for functions localized at multiple scales 26

  26. Local Regression Basis coefficients Nonlinear features/basis functions Globally supported basis functions (polynomial, fourier) will not yield a good representation 27

  27. Local Regression Basis coefficients Nonlinear features/basis functions Globally supported basis functions (polynomial, fourier) will not yield a good representation 28

  28. What you should know Linear Regression Least Squares Estimator Normal Equations Gradient Descent Geometric and Probabilistic Interpretation (connection to MLE) Regularized Linear Regression (connection to MAP) Ridge Regression, Lasso Polynomial Regression, Basis (Fourier, Wavelet) Estimators Next time - Kernel Regression (Localized) - Regression Trees 29

Recommend


More recommend