learning outline
play

LEARNING Outline Linear Models 1D Ordinary Least Squares (OLS) - PowerPoint PPT Presentation

Linear Regression CSCI 447/547 MACHINE LEARNING Outline Linear Models 1D Ordinary Least Squares (OLS) Solution of OLS Interpretation Anscombes Quartet Multivariate OLS OLS Pros and Cons Optional Reading


  1. Linear Regression CSCI 447/547 MACHINE LEARNING

  2. Outline  Linear Models  1D Ordinary Least Squares (OLS)  Solution of OLS  Interpretation  Anscombe’s Quartet  Multivariate OLS  OLS Pros and Cons

  3. Optional Reading

  4. Terminology  Features (Covariates or predictors)  Labels (Variates or targets)  Regression  Classification

  5. Types of Machine Learning  Unsupervised Weight  Finding structure in data  Supervised  Predict from given data Height Women Weight Weight Men Height Classification Height OLS Regression categorical output data (Prediction) Logistic Regression continuous output data

  6. What is a Linear Model?  Predict Housing Prices  Depends on:  Area  # of bedrooms  # of bathrooms  Hypothesis is that relationship is linear  Price = k 1 (Area) + k 2 (#bed) + k 3 (#bath)  y i = a 0 + a 1 x 1 + a 2 x 2 + …

  7. Why Use Linear Models?  Interpretable  Relationships are easy to see  Low Complexity  Prevents overfitting  Scalable  Scale up to more data, larger problems  Baseline  Can benchmark other methods against them

  8. Examples of Use  Example of Use  MNIST dataset – handwritten digits  Best performance – neural networks and regularization  99.79% accurate  Takes about a day to train  More difficult to build  Logistic Regression  92.5% accurate  Takes seconds to train  Can be built with less expertise  Building Blocks of Later Techniques

  9. Optional Reading

  10. Definition of 1-Dimension OLS  The Problem Statement  i is an observation, we have N of them  i = 1…N  x is the independent variable (feature)  y is dependent variable (output variable)  y = ax + b, a,b are constants ˆ  y i = ax i + b OR y i = ax i + b + ε  Two unknowns – want to solve for a and b

  11. The Loss Function  L = ∑ i=1 ˆ N (y i – y i ) 2  Goal is to minimize this function ˆ  Using y i = ax i + b, the equation becomes:  L = ∑ i=1 N (y i – ax i - b) 2  So this is the equation we want to minimize

  12. Solution of OLS  Derivation  L = ∑ i=1 N (y i – ax i - b) 2  Want to minimize L  Take derivative of loss function wrt each variable 𝑒𝑀 𝑒𝑀  𝑒𝑏 = 0, 𝑒𝑐 = 0 𝑒𝑀 𝑒𝑀  N 2(y i – ax i - b)(-x i ) = 0 𝑒𝑏 = 0 => 𝑒𝑏 = ∑ i=1 𝑒𝑀  => N x i y i – a ∑ i=1 N x i 2 - b ∑ i=1 N x i = 0 𝑒𝑏 = ∑ i=1

  13. Solution of OLS  Derivation 𝑒𝑀 𝑒𝑀  N 2(y i – ax i - b)(+1) = 0 𝑒𝑐 = 0 => 𝑒𝑐 = ∑ i=1 𝑒𝑀  => N y i –∑ i=1 N x i – bN = 0 𝑒𝑐 = ∑ i=1 1 𝑏 𝑂 ∑ i=1  b = N y i – N x i 𝑂 ∑ i=1  This is the closed form solution for b

  14. Solution of OLS  Derivation  From first set, 𝑒𝑀  N x i y i – a∑ i=1 N x i 2 - b∑ i=1 N x i = 0 𝑒𝑏 = ∑ i=1 1 𝑏 𝑂 ∑ i=1  => ∑ i=1 N x i y i = a∑ i=1 N x i 2 + ∑ i=1 N x i ( N y i – N x i ) 𝑂 ∑ i=1 𝑦 𝑗 𝑧 𝑗 − 1 𝑂 𝑂 𝑂 𝑦 𝑗 𝑧 𝑗 1 1  a = 𝑗 − 1 𝑂 𝑂 𝑦 2 𝑦 𝑗 ) 2 𝑂 ( 1 1  This is the closed form solution for a

  15. Solution of OLS  Optimal Choices

  16. Interpretation  Interpretation of a and b  a is the slope of the line  tangent of angle θ  the effect of the independent variable on the dependent y – dependent variable θ x – independent variable  b is the intercept of the line

  17. Interpretation  Interpretation of L  L = ∑ i=1 N (y i – y i ) 2  Expresses how well the solution captures the variation in the data  R 2 = 1 – MSE/Var(y)  R 2  [0, 1]

  18. Interpretation

  19. Anscombe’s Quartet

  20. Anscombe’s Quartet  Same values for mean, variance and best fit line  R 2 values are the same for each example  But … linear regression may not be the best for the last three examples

  21. Multivariable OLS  Definition of Model  Data Matrix  The Loss Function

  22. Mutivariable OLS  i = an observation  N = number of observations  i = 1…N  M = number of features  x i = [x i1 , x i2 , …, x iM ]  y i - dependent variable 𝑦 11 𝑦 12 … 𝑦 1𝑁 … … …  Data matrix: X = 𝑦 𝑂1 𝑌 𝑂2 … 𝑌 𝑂𝑁

  23. Mutivariable OLS 𝑦 11 𝑦 12 … 𝑦 1𝑁 … … …  Data matrix: X = 𝑦 𝑂1 𝑌 𝑂2 … 𝑌 𝑂𝑁  y = ax + b(1)  Add a column of all 1’s to left of data matrix to get bias term included ˆ  y i = B 0 + B 1 x i1 + B 2 x i2 + … + B M x iM 𝐶 0 …  x i . B, B = , y = XB 𝐶 𝑁

  24. Multivariable OLS  Loss Function  L = ∑ i=1 ˆ N (y i – y i ) 2  Still want to minimize L  L = ∑ i=1 N (y i – (B 0 + B 1 x i1 + … + B M x iM )) 2  L = ∑ i=1 N (y i – x i B) 2  Norm manner – L2 norm of the vector  L = 𝑧 − 𝑌𝐶 2 2  L = (y – XB) T (y – XB)

  25. Optimization  A Few Facts from Matrix Calculus  𝑒(𝑏𝑦) = 𝑏 𝑒𝑦  𝑒 𝑏𝑦 2 = 2𝑏𝑦 𝑒𝑦

  26. Optimization  Minimizing the Loss  L = (y – XB) T (y – XB) 𝑒𝑀 𝑒𝐶 = 0   𝑒 𝑧 −𝑌𝐶 𝑈 (𝑧−𝑌𝐶) = 0 𝑒𝐶  𝑒(𝑧 𝑈 𝑧 −𝑧 𝑈 𝑌𝐶 −𝐶 𝑈 𝑌 𝑈 𝑧+𝐶 𝑈 𝑌 𝑈 𝑌𝐶) = 0 ((XY) T = Y T X T ) 𝑒𝐶  -(X T y) – (X T y) + 2(X T X)B = 0  X T y = (X T X)B  B = (X T X )-1 X T y (assuming X T X is invertible, which is true if X is a full rank matrix, that is none of its columns are linearly dependent)

  27. OLS Pros and Cons  OLS  Pros  Efficient to compute  Unique minimum  Stable under perturbation of data  Easy to interpret  Cons  Influenced by outliers  (X T X) -1 may not exist  Features may not be linearly independent

  28. Summary  Linear Models  1D Ordinary Least Squares (OLS)  Solution of OLS  Interpretation  Anscombe’s Quartet  Multivariate OLS  OLS Pros and Cons

Recommend


More recommend