least mean squares regression
play

Least Mean Squares Regression Machine Learning 1 Least Squares - PowerPoint PPT Presentation

Least Mean Squares Regression Machine Learning 1 Least Squares Method for regression Examples The LMS objective Gradient descent Incremental/stochastic gradient descent 2 Least Squares Method for regression Examples The


  1. Least Mean Squares Regression Machine Learning 1

  2. Least Squares Method for regression • Examples • The LMS objective • Gradient descent • Incremental/stochastic gradient descent 2

  3. Least Squares Method for regression • Examples • The LMS objective • Gradient descent • Incremental/stochastic gradient descent 3

  4. What’s the mileage? Suppose we want to predict the mileage of a car from its weight and age What we want: A function that can Weight Age (x 100 lb) (years) Mileage predict mileage x 1 x 2 using x 1 and x 2 31.5 6 21 36.2 2 25 43.1 0 18 27.6 2 30 4

  5. Linear regression: The strategy Predicting continuous values using a linear model Assumption: The output is a linear function of the inputs Mileage = w 0 + w 1 x 1 + w 2 x 2 Learning: Using the training data to find the best possible value of w Prediction: Given the values for x 1 , x 2 for a new car, use the learned w to predict the Mileage for the new car 5

  6. Linear regression: The strategy Predicting continuous values using a linear model Assumption: The output is a linear function of the inputs Mileage = w 0 + w 1 x 1 + w 2 x 2 Parameters of the model Also called weights Collectively, a vector Learning: Using the training data to find the best possible value of w Prediction: Given the values for x 1 , x 2 for a new car, use the learned w to predict the Mileage for the new car 6

  7. Linear regression: The strategy For simplicity, we will assume • Inputs are vectors: 𝐲 ∈ ℜ ! that the first feature is always 1. • Outputs are real numbers: 𝑧 ∈ ℜ 1 𝑦 ! 𝒚 = ⋮ • We have a training set 𝑦 " D = { 𝐲 ! , 𝑧 ! , 𝐲 " , 𝑧 " , ⋯ } This lets makes notation easier • We want to approximate 𝑧 as 𝑧 = 𝑥 ! + 𝑥 " 𝑦 " + ⋯ + 𝑥 # 𝑦 # 𝑧 = 𝐱 $ 𝐲 𝐱 is the learned weight vector in ℜ # 7

  8. Examples y x 1 One dimensional input 8

  9. Examples y Predict using y = w 1 + w 2 x 2 x 1 One dimensional input 9

  10. Examples y Predict using y = w 1 + w 2 x 2 The linear function is not our only choice. We could have tried to fit the data as another polynomial x 1 One dimensional input 10

  11. Examples y Predict using y = w 1 + w 2 x 2 The linear function is not our only choice. We could have tried to fit the data as another polynomial x 1 One dimensional input Two dimensional input Predict using y = w 1 + w 2 x 2 +w 3 x 3 11

  12. Least Squares Method for regression • Examples • The LMS objective • Gradient descent • Incremental/stochastic gradient descent 12

  13. What is the best weight vector? Question : How do we know which weight vector is the best one for a training set? For an input ( x i , y i ) in the training set, the cost of a mistake is Define the cost (or loss ) for a particular weight vector w to be Sum of squared costs over the training set One strategy for learning: Find the w with least cost on this data 13

  14. What is the best weight vector? Question : How do we know which weight vector is the best one for a training set? For an input ( x i , y i ) in the training set, the cost of a mistake is Define the cost (or loss ) for a particular weight vector w to be Sum of squared costs over the training set One strategy for learning: Find the w with least cost on this data 14

  15. What is the best weight vector? Question : How do we know which weight vector is the best one for a training set? For an input ( x i , y i ) in the training set, the cost of a mistake is Define the cost (or loss ) for a particular weight vector w to be One strategy for learning: Find the w with least cost on this data 15

  16. What is the best weight vector? Question : How do we know which weight vector is the best one for a training set? For an input ( x i , y i ) in the training set, the cost of a mistake is Define the cost (or loss ) for a particular weight vector w to be Sum of squared costs over the training set One strategy for learning: Find the w with least cost on this data 16

  17. What is the best weight vector? Question : How do we know which weight vector is the best one for a training set? For an input ( x i , y i ) in the training set, the cost of a mistake is Define the cost (or loss ) for a particular weight vector w to be Sum of squared costs over the training set One strategy for learning: Find the w with least cost on this data 17

  18. Least Mean Squares (LMS) Regression Learning: minimizing mean squared error 18

  19. Least Mean Squares (LMS) Regression Learning: minimizing mean squared error Different strategies exist for learning by optimization • Gradient descent is a popular algorithm (For this particular minimization objective, there is also an analytical solution. No need for gradient descent) 19

  20. Least Squares Method for regression • Examples • The LMS objective • Gradient descent • Incremental/stochastic gradient descent 20

  21. We are trying to minimize Gradient descent J( w ) General strategy for minimizing a function J( w ) • Start with an initial guess for w , say w 0 • Iterate till convergence: w – Compute the gradient of the gradient of J at w t Intuition : The gradient is the direction – Update w t to get w t+1 by taking of steepest increase in the function. To a step in the opposite direction get to the minimum, go in the opposite of the gradient direction 21

  22. We are trying to minimize Gradient descent J( w ) General strategy for minimizing a function J( w ) • Start with an initial guess for w , say w 0 • Iterate till convergence: w – Compute the gradient of the w 0 gradient of J at w t Intuition : The gradient is the direction – Update w t to get w t+1 by taking of steepest increase in the function. To a step in the opposite direction get to the minimum, go in the opposite of the gradient direction 22

  23. We are trying to minimize Gradient descent J( w ) General strategy for minimizing a function J( w ) • Start with an initial guess for w , say w 0 • Iterate till convergence: w – Compute the gradient of the w 1 w 0 gradient of J at w t Intuition : The gradient is the direction – Update w t to get w t+1 by taking of steepest increase in the function. To a step in the opposite direction get to the minimum, go in the opposite of the gradient direction 23

  24. We are trying to minimize Gradient descent J( w ) General strategy for minimizing a function J( w ) • Start with an initial guess for w , say w 0 • Iterate till convergence: w – Compute the gradient of the w 2 w 1 w 0 gradient of J at w t Intuition : The gradient is the direction – Update w t to get w t+1 by taking of steepest increase in the function. To a step in the opposite direction get to the minimum, go in the opposite of the gradient direction 24

  25. We are trying to minimize Gradient descent J( w ) General strategy for minimizing a function J( w ) • Start with an initial guess for w , say w 0 • Iterate till convergence: w – Compute the gradient of the w 3 w 2 w 1 w 0 gradient of J at w t Intuition : The gradient is the direction – Update w t to get w t+1 by taking of steepest increase in the function. To a step in the opposite direction get to the minimum, go in the opposite of the gradient direction 25

  26. We are trying to minimize Gradient descent J( w ) General strategy for minimizing a function J( w ) • Start with an initial guess for w , say w 0 • Iterate till convergence: w – Compute the gradient of the w 3 w 2 w 1 w 0 gradient of J at w t Intuition : The gradient is the direction – Update w t to get w t+1 by taking of steepest increase in the function. To a step in the opposite direction get to the minimum, go in the opposite of the gradient direction 26

  27. We are trying to minimize Gradient descent for LMS 1. Initialize w 0 2. For t = 0, 1, 2, …. t ) 1. Compute gradient of J( w ) at w t . Call it r J( w 2. Update w as follows: r : Called the learning rate (For now, a small constant. We will get to this later) 27

  28. We are trying to minimize Gradient descent for LMS 1. Initialize w 0 2. For t = 0, 1, 2, …. What is the gradient of J? t ) 1. Compute gradient of J( w ) at w t . Call it r J( w 2. Update w as follows: r : Called the learning rate (For now, a small constant. We will get to this later) 28

  29. We are trying to minimize Gradient of the cost • The gradient is of the form • Remember that w is a vector with d elements – w = [w 1 , w 2 , w 3 , ! w j , ! , w d ] 29

  30. We are trying to minimize Gradient of the cost • The gradient is of the form 30

  31. We are trying to minimize Gradient of the cost • The gradient is of the form 31

  32. We are trying to minimize Gradient of the cost • The gradient is of the form 32

  33. We are trying to minimize Gradient of the cost • The gradient is of the form 33

  34. We are trying to minimize Gradient of the cost • The gradient is of the form 34

  35. We are trying to minimize Gradient of the cost • The gradient is of the form One element of the gradient vector 35

  36. We are trying to minimize Gradient of the cost • The gradient is of the form One element of the gradient vector Sum of Error × Input 36

Recommend


More recommend